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ABSTRACT 


This  thesis  was  undertaken  to  examine  an  acoustical 
signal  processinq  test  Dedf  s i m i 1 i a r  to  the  one  installed  at 
the  Naval  Postgraduate  School*  to  be  used  primarily  for 
exoerimental  apolications.  The  major  components  include  two 
PDP-11  series  computers*  at  least  one  array  processor/  a 
mass  storage  unit  as  well  as  assorted  input  and  display 
eauipment.  Of  major  interest  were  the  computer  selection* 
array  processor  selection  and  basic  signal  routing  to 
facilitate  real-time  utilization. 
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I.   I,NTRODUCTIOiM 


The  purpose  of  this  study  is  to  begin  evaluation  of  a 
proposea  sianal-orocessing  test  bea  similiar  to  the  test  bed 
being  installed  at  the  fJaval  Post  Graduate  School/  Monterey^ 
Califor^nia.  The  oasic  test  bed  consists  of  an  analog 
subsystem  (fig  1),  data-processing  subsvsteT.  (fig  d)  r 
signal-processing  subsystem  (fig  3)  ana  display  subsystem 
(fig  ^)     to  bp  used  for  general-purpose  Mava!  research. 

The  analog  subsystem  of  the  test  bed  was  desioned  for 
signal  reception  and  condition ina.  This  is  basically 
accomplished  by  a  12o-line  input  into  a  programmed  matrix 
svvitch  v^ihicr  emits  l>i  lines  of  outout.  These  32  lines 
continue  throuan  a  oroaram-controlled  filter  issuing  output 
from  the  subsysten, 

Tne  signal-processing  sucsvstem  receives  results  from 
the  analog  subsystem  via  an  AM -5*^  0  0  A/D  converter.  This 
information  can  then  oe  stored  in  an  Arroex  N'egastore  unit  to 
be  later  processed  by  one  MAP-300  array  orocessor.  A 
PDP-11/3^  computer  controls  the  mass  storage  device*  the 
arr ay  processor  ana  input  functions.  Output  is  directed  to 
the  data-processing  subsystem. 

The  data-processing  subsystem  receives  the  processed 
data   and   controls   the  operation  of  the  display  subsystem. 
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Display   devices   oresently   include   a   Ramtek   9500   Video 

i 

■  Display   Unit  (color  and  shades  of  gray)^  the  Versatec  1600A 
printer/plotter  and  an  EPC  2300  Gram  writer. 

The  goal  of  this  study  was  to  examine  the  major  system 
^components/  computers^  array  processors  and  major  data  paths 
to  determine  feasibility  for  various  uses  and  suggest 
possible  alternative  methods^  especially  in  the  real-time 
environment.  The  basic  task  o*  the  test  bed  was  assumeg  to 
be  general  witn  no  suagestion  of  specific  tasks  although  it 
was  recognized  that  many  uses  and  data  rates  may  be 
ut  1  1 i  zed. 


Chapter  II  discusses  specific  comouter  manufacturers 
and  computer  types.  Chapters  III/  IV  and  V  deal  with  the 
two  most  popular  gene r a  1 -pu roose  array  processors  on  the 
market*  discussing  the  pros  and  cons  of  each.  C^^apter  VI 
aives  final  conclusions  ana  recommendations  concerning  the 
proposed  test  bed. 


II.   COMPUTERS 


A.   GENERAL 

For  the  test  bed  evaluation/  choosing  the  proper 
computer  is  important  since  a  varying  ar^ount  of 
computational  power  is  required  for  each  subsystem.  Also/  a 
gambit  of  functions  ana  uses  may  be  tried  necessitating  a 
system  that  must  realistically  emulate  many  soeed/  cost  and 
memory  constraints.  A  common  and  poDular  system  affords 
better  software  support  while  still  maintaining  a  low  price. 
The  ability  to  rely  on  system  supoort  is  an  important  issue 
when  consiaering  long  term  use.  A  popular  system  tends  to 
develoD  newer,  more  efficient  software  oacicages  earlier  anj 
more  freguently  than  go  less  used  systems. 


For  large  array  processing  applications  with  many 
disDlay  devices  the  ideal  situation  would  be  for  one 
comouter  to  initially  load  the  array  processor  ana  then  act 
as  a  "whole  system"  monitor  and  statistician.  It  could  also 
oerform  the  information  aathering  function  while  another 
computer  would  act  as  the  output  processor  for  t^e  array 
processor  ana  control  the  oisplay  devices.  That  situation 
would  be  similiar  to  that  of  a  test  bed  where  flexibility 
may  be  the  key  and  being  computer-bound  would  be  hiahly 
undesirable  and  possiply  unjustly  influence  the  evaluation 
of  the  array  processor.  An  ultimate  goal  might  to  be  to 
choose   the  smallest  comouter  capable  of  operating  the  array 


processor  and  associatea  cisplay  devices  in  the  desired 
ifashion  while  orovidinq  for  product  expansion.  It  is 
realized  that  for  test  and  research  activities  more 
computing  power  may  be  necessary  than  would  be  neeaed  for 
normal  production  activities. 

In  October  75/  the  Computer  Family  /Architecture 
Selection  Committee  was  formea  to  evaluate  computer 
architecture  canaidates  as  a  basis  for  a  family  of 
software-compatible  military  comouters.  Ten  Army  and  17 
Navy  oraanizations  were  represented  on  the  selection 
committee  [11].  The  purpose  was  to  select  an  architecture 
which  could  be  useo  as  a  standard/  haa  a  proven  instruction 
set  and  an  architecture  which  could  be  used  in  advanced 
techno! oaies. 

B.   PDP-11  FAMILY 


The   committee   voted   that   tne   POP-11   had   the   best 
i; 
i  architecture   for   use   in   the   Military   Computer   Family, 

However/  it  aenerally  container]  a  small   address   space   ana 

possible   floating   ooint  instruction  compatability  oroblems 

with  existing  systems.   The  IBM  system  370  was  ranked  second 

with   the   Interdata   8/32   rankeo   third  [\2],        The  Digital 

Equipment   Corporation   PDP-11   series   provided   a   popular 

example   of   both   the   or  ice   and  performance  excellance  in 

available  computer  systems.  Iheir  popularity  is  evidenced  by 

the   shipping   of   10/000   PDP-11/Oa   and   10/000   PDP-ll/3a 


computers  as  of   1975/   1976   respectively   [<?8]  .    relevant 

,PDP-11   computers   considered  were  the  PDP-11/Oa,  PDP-ll/3a/ 
I 
PDP-ll/aS,  HDP-U/55,  PDP-11/foO,  and  the   PDP-11/70   (listed 

from   least   powerful   to  most  powerful).   in  hat  follows  is  a 

brief  descriotion  of  each  system.   Unless  otherwise   stated/ 

It   will   be   assumed   that   the   more   powerful  system  will 

contain  all  the  features   of   systems   less   powerful.    The 

POP-11/05   ana   the  LSI-11  series  were  not  considerea  due  to 

their  not  having  the  advantaaes  of  the  UNIBUS  [c'8]. 


1.   POP-11/Oa 


The  PDP-11/Oa  is  the  smallest  compute^  of  the  PDP-11 
series*  containing  the  entire  central  processing  unit  on  one 
board  permitting  room  for  crastic  expansion  due  to  unused 
chassis  area ,  The  system  contains  self-test  logic  to 
determine  system  ooerability  every  time  the  orocessor  has 
power  applied/  the  console  emulator  is  used  or  the  bootstrap 
routines  are  initiated.  The  console  emulator  allows  the 
operator  to  control  the  system  from  a  terminal  without 
Dhysically  throwing  switches  or  reading  lights  on  the  front 
panel  of  the  unit.  The  bootstrap  loader  automatically 
restarts  the  system  from  various  peripheral  devices  without 
need  of  Dhysical  switch  throwing.  Memory  size  varies  from 
8K  bytes  to  5bK  bytes  C8  bits  =  1  byte)  of  either  MOS 
(metalic  oxide  semiconductor)  or  core  type  with  an  average 
access  time  of  50U-nanoseccnds  and  system  cycle  time  of 
725-nanoseconds   [301.    A   typical   cost   of  this  system  is 


,950  [29] . 


2.   PUP-l l/5a 

The  P0P-ll/3a  is  the  next  size  of  the  PDP-11  family 
and  is  the  lowest  architecture  to  contain  a  memory 
managenient  routine  to  orovide  proaram  protection  so  user 
programs  cannot  access  or  change  system  memory  space.  (In 
the  11/Oa  it  is  tne  orogrammers  resoons i b i H t y  to  maintain 
and  protect  this  area.)  Memory  management  also  allows 
virtual  memory  paging  of  uc  to  16  pages  ranging  in  size  from 
6U  bytes  to  8K  bytes  for  a  total  possible  memory  of  256K 
bytes  of  which  128K  is  physical.  (The  highest  ^K  of  address 
space  on  the  POP- 1 1 / 3a/a5/55/60/7 0  is  used  for  registers 
that  store  I/O  data  or  status  of  indiviaual  peripheral 
devices.  This  means  that  the  11/3^  can  physically  address 
12aK  bytes  but  virtually  aadress  256K  bytes.)  The  11/34 
allows  Doth  core  memory  and  '^^OS  memory  to  be  used 
concu  r  rent  1 y . 


The  PDP-11/3'4  also  contains  a  memory  option  called 
cache  memory  which  is  a  2K  high  speed  (300-nanosecona  cycle 
time)  memory  used  to  store  a  copy  of  the  the  most  recently 
selectea  portions  of  main  memory  afforoing  faster  access  of 
instructions  ana  data.  Tne  "hit"  time  or  time  the  next 
access  is  resident  in  cache  is  approximately  8b  percent  for 
the  11/34.  Time  is  saved  by  less  area  to  access,  therefore 
less  search  time,  and  shorter  less  complicated  data 
transmission.    Since   '^'OS    memory    is    volatile    (loses 


information  when  oower  is  removed)^  the  ll/3'4  has  a  battery 
back-up  ODtion  which  will  retain  information  in  the  NiOS 
memory  for  apo rox i ma t e 1 y  two  hours.  The  PDP-11/3^  can 
operate  in  two  rrodes^  Kernel  and  User.  This  two  mode 
concept  is  important  in  security  since  the  User  mode  is 
prevented  from  executing  certain  instructions  that  could 
cause  modification  of  the  Kernel  proaram,  halt  the  computer 
or  use  memory  soace  assigned  to  the  Kernel  or  other  users. 
Monitoring  ana  Supervisory  routines  are  executed  in  the 
Kernel  mode.  The  Kernel/User  concept  is  imoortant  since  if 
the  Kernel  can  be  made  secure/  the  overall  security  of  the 
operating  system  from  accidental  harm  is  much  easier  to 
achieve.  Prices  range  from  211,0^0  to  S53,800  [29], 

3.   PDP-11/45 


The  PDP-11/45  system  is  aesigned  for  soeed.  The 
high-soeed  central  processor  allows  program  execution  of 
three  million  instructions  per  second  ana  has  either 
300-nanosecond  bipolar  memory  or  980-nanosecona  core  memory 
available.  ■"''OS  memory  is  also  available  as  an  "add-on" 
option.  Total  memory  soace  is  the  same  as  the  11/34,  [here 
is  an  optional  floating  point  processor  to  hanole  douPle 
precision  arithmetic.  The  system  is  especially  good  for 
mu  1  t  i  o  1  e-t  as  k:  apo  1  i  c  a  t  i  on  s  /  otherwise  it  is  the  same  as  the 
11/34.   The  price  is  $41,300  129]. 


a.   PDP-ll/55 


The  PDP-11/55  system  imoroves  on  the  11/^5  by 
insertina  a  dual  bus  structure  to  allow  intermix inq  core  and 
bipolar  memory  (ud  to  ^4dK  with  memory  managementJ  to 
optimize  system  performance.  Two  separate  semiconductor 
I  controllers  allow  simultaneous  data  transfer  for  increased 
system  tnroughput.  Both  the  11/^5  ana  11/55  hardware  have 
been  optimized  towards  a  multiproQramminq  environment  by 
installing  a  tnird  mode/  Supervisor,  to  control  system 
operation  while  oroperly  handling  multi-user  operations 
[303.   The  price  is  S50,aU0  to  480,780  [29]  . 


5.   PDP-1 1/bO   . 

The  PDP-11/bO  system  is  the  interface  between  the 
mid~range  mini  and  the  more  powerful  mini.  With  the  11/60 
we  See  the  first  caoability  to  microprogram  and  four  levels 
of  priority  interrupts.  The  system  was  also  designea  with 
the  engineering  traae-off  between  ease  of  maintenance  and 
reliability  in  mind.  A  system  that  is  very  difficult  to 
reoair  after  failure  may  oe  less  useful  tnan  an  easy  system 
to  repair  that  fails  more  often.  The  availability  of  the 
system  is  a  measure  of  mean  time  between  failure  divided  by 
the  quantity  mean  time  between  failure  plus  mean  time  to 
repair  (MTBF  /(MT6F  +  VTTR)]  [303.  Digital  Equipment 
Corporation  has  tried  to  allow  for  a  more  complex 
architecture  (probable  higher  failure  rate)  by  oroviding  a 
•Reliability   and  Maintenance  Proqram  (RAMP)  software  package 
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to  helD  locate  software  and  hardware  errors^  decreasing  the 
MTTK  thereby  increasing  availability.  The  price  ranges  from 
$a2/a00  to  over  S200,000. 

6.   PDP-ll/70 


The  PDP-11/70  is  the  largest  of  the  PCP-11  series 
and  gives  tho  power  of  a  large  con^puter  at  the  cost  (J63/000 
to  514^/860  [d9] )  of  a  minicomputer.  It  was  designee  to 
operate  in  high-performance  systems  and  is  iaeally  suited 
for  real-time  systems  due  to  the  high  speed  of  execution  and 
the  8  0-95  oercent  "hit"  ratio  of  cache  memory.  Aggressinq  of 
over  four  Megabytes  of  physical  memory  is  theoretically 
possible  with  the  Rd.  bit  acdresser,  although  iScK  of  this  ^M 
must  be  used  for  the  UNIBUS  referencing,  (The  UMBUS  can 
only  address  18  bits^  therefore  the  memory  manacement 
routine  must  convert  the  4  ^'!egaDyte  addresses  as  if  it  we^e 
a  virtual  location,)  At  the  present  time  however  only  2^  of 
physical  memory  can  actually  be  accommotated  by  the  UNIBbS. 
There  is  the  option  to  use  64  oit  floating  ooinr  numbers  in 
calculations.  /oth  two  megabytes  of  main  memory  there  is 
little  concern  for  memory  constraints  during  a  mult i -task 
environment.  The  option  of  attaching  high  speed  mass 
storage  devices  to  t^e  central  orocessing  unit  through 
dedicated  paths  is  available.  The  system  has  eight  levels 
of  priority  and  a  large  amount  of  flexibility  in  its 
orogramming  making  it  Dossible  to  run  several  levels  of 
display  devices  under  varying  loading  conditions. 
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III.   ARRAY  PROCESSOR 


An  Array  Processor  is  an  unit  capable  of  performing 
floating  point  operations  on  large  data  arrays  or  data 
[  streams.  It  usuall/  operates  as  a  peripheral  device  to  a 
"host"  computer  system  and  best  performs  the  repetitious 
reiterative  operations  requiring  a  large  number  of 
summations  and  multiplications  tyoicallv  encountered  in 
matrix  calculations  such  as  correlations  and  fast  fourier 
transforms.  This  system  is  special  purpose  and  cannot 
"think"  for  itself  since  it  has  no  executive  functions 
except  tnose  necessary  to  control  the  mathematics  required 
to  perform  additionSf  mu 1 t i d 1 i c a t i ons  and  data  movement 
C181  . 

IN  i  t  h  an  array  processor^  large  transforms  can  be 
achievea  dependent  only  on  memory  capacity.  These 
transforms  can  be  done  faster  tnan  in  the  normal  CPU  since 
the  array  nrocessor  performs  only  one  function  at  a  time 
(here  function  is  used  in  the  broader  sense  as  in 
transposition)  and  there  is  no  need  for  the  normal  overhead 
control  Ionic  of  a  general  purpose  computer   [?']  ,  This   is 

more  advantageous  than  a  special  purpose  comouter  in  that  an 
array  processor-  can  he  programmed  to  execute  various  array 
processing  applications  and  can  also  act  as  a  peripheral. 
Ideally  a  system  would  be  wanted  that  could  handle  any  size 
arrays  including  the  possitility  of  very  large  arrays  if  the 


^S 


situation  warranted.  Fnis  is  theoretically  possible  by  using 

sequential   processing   anc   stringing   a   series   of   array 

I  processors   together   having    each    oerforin    a    specific 

'  operation.     That    woulo    only    be   aoodf   however/   for 

applications  not  neeaing  results  of  data  processeo  in  step  N 

!  to   be   used   in   step   ,'^J-l.    Using   one   array   processor/ 

efficient  ana  sufficient   performance   of   large   arrays   is 

possible   due   to  trie  soecial  architecture  and  memory  of  the 

array  o  roc  esso  r  . 


Two  general  purpose  a  r  r  a'y'  orocessors  oresently  seem  to 
dominate  the  market.  These  are  the  CSP  Inc.  '■^AP-3U0  ("''aero 
Array  Processor)  ana  the  Floating  Point  Systems  AP-l^OB. 
While  the  basic  function  of  each  is  similiar»  tne  actual 
operation  is  ouite  different. 


The  theoretical  adv an t aae/d i sadvan t age  of  each 
processor  will  be  oiscussed  in  detail  comparing 
architecture/  operational  characteristics/  software  support 
ana  proaramabilitv.  Cnaotef*  VII/  Conclusions  ana 
Recommendations/  ^ill  ciscuss  the  actual  croblems 
encoun'"erea  .vith  tne  installation  of  the  M1AP-3O0  system  to 
be  used  in  the  evaluation  here  at  ♦"he  fiaval  Postgraauate 
Schoo  1  . 
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IV.   TH£  AP-120B  ARRAY  PPQCESSOR 

The  AP-120B  Array  Processor  (fig  53  is  many f ac t u rea  by 
Floating  Point  Systems  Inc./  Portland/  Oregon,  It  operates 
synchronously  using  a  lo7-nanosecond  cycle  time  master  cIock 
synchronized  with  a  50  percent  safety  margin  every  cycle  for 
worst-case  temperature  and  voltage.  The  system  uses  ore- 
conditioned  meaium-scale  integrated  circuitry/  large-scale 
integrated  circuitry  ana  transistor-to-transistor  logic. 
The  AP-120B  is  capable  of  operating  in  temperatures  from  10 
to  '-10  degrees  centigrade  at  0  to  90  percent  relative 
humidity.  This  processor  is  also  able  to  operate  using  one 
of  these  various  power  ootions?  105/125  VAC  at  120  ampS/ 
160/228  VAC  at  10  amps  or  210/250  VAC  at  10  amps  with  eitner 
50/o0  hertz  or  50/aOO  hertz  available  [71. 

The  AP-120B  emcloys  a  technigue  <nown  as  oioeline 
processing  to  increase  throughput.  Pipeline  orocessing 
utilizes  a  combination  of  tne  elements  of  both  secuential 
processing  and  parallel  orocessing,  A  sinale  basic 
orocessor/  like  an  adder,  is  logically  divided  into  integral 
units  that  can  each  perform  a  specific  and  separable 
function  while  another  unit  of  the  adder  simultaneously 
performs  another  function  of  the  addition  task.  '^hen  one 
task  is  completed/  it  will  move  on  to  the  next  step  in  the 
seguence  allowing  the  just  vacated  section  of  the  acder  to 
be  filled  witn  the  next  task  in  the   queue.    Tnroughout   is 
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increased  by  insuring  that  the  entire  system  is  always  full. 
This  technique  works  with  both  the  adder  and  the  multiplier 
in  the  AP-120B.  Pipelining  is  good  for  vector  operations 
since  vectors  are  Dasically  independent  ana  a  solution  of 
vector  N  is  not  needed  before  vector  i'^i+1  can  he  started. 
However  scalar  operations  are  basically  seauential 
operations  ana  cannot  make  use  of  oioelining  [1].  By 
carefully  considering  every  operation^  especially  those  in 
looPSf  the  programmer  can  squeeze  more  operations  per  time 
interval  by  piDelining  than  would  be  possible  using  standard 
sequential  technicues.  Fne  time  is  generally  limitea  by  the 
multiolication  time  [I'J], 

The  AP-1206  instruction  word  is  up  to  6^-bits  long  ana 
can  perform  a  maximum  of  ten  different  operations  in  a 
single  cycle.  As  an  examole^  an  add^  a  multiply^  a  move  to 
and  from  each  data  oaa  (there  are  two)  and  an  adaress 
increment  or  decrement  can  all  be  performed  in  the  same 
cycle.  Any  one  instruction  or  comoination  of  the  above  can 
be  performed  as  long  as  the  resource  required  is  not  being 
usea  in  another  ooeration  (some  operations  are  multi-cvcle 
and  "lock-out"  the  resource  until  they  are  comolete).  It  is 
the  prooramimers  ooliqation  to  insure  that  all  required 
resources  are  available  when  they  are  requested  or  else  tney 
will  be  lost  [71.  As  an  example?  a  reao  from  a  data  pad 
takes  at  least  two  cycles.  If  cycle  ^J  wanted  to  read  from 
Data  Pad  X  and  cycle  N-1  already  initiated  a  read  from  Data 
Pad  X,  the  entire  instruction  word   for   cycle   N   woula   be 
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delayed  one  cycle  waiting  for  the  resource  to  become 
available.  This  ability  to  perform  more  that  one  basic 
operation  per  cycle  allows  a  theoretical  30  million 
instructions  per  second  to  be  executed.  Due  to  memory  size 
limitations  and  algorithms  not  needing  ten  operations  per 
instruction  word  for  sustained  periods  this  rate  can  never 
be  fully  attained  except  possibly  for  short  bursts  [3b3, 
Since  some  of  these  operations  are  housekeeping  functionSf 
the  maxirnum  number  of  arithmetic  operations  per  second 
theoretically  possible  is  twelve  million  for  vectors  and 
five  million  for  scalars  (scalar  speea  is  much  lower  since 
it  reauires  sequential  processing  and  cannot  take  advantage 
of  p  i  pe 1  i  n  i  ng  J   I  1  i  . 

The  AP-120B  uses  a  3^-bit  data  word  which  Floating 
Point  Systems  contends  generates  better  accuracy  than  the 
3<^-Dit  word  commonly  used  by  other  systems  [7],  This  38-bit 
word  consists  of  a  ten-oit  biased  exponent  and  28-bit  twos 
compliment  mantissa  thereby  allowing  numners  in  a  range  of 
3.7  *  10  **  -15S  to  b  .7  *  10  **  153  to  be  represented.  The 
2b-bit  mantissa  allows  for  extensive  calculations  without 
significant  truncation  errors  or  a  maximum  relative  error  of 
approximately  7.5  *  10  **  -9  per  arithmetic  operation  or 
about  8  decimal  digit  accuracy.  Floating  Point  Systems  Inc. 
also  employes  a  techniaue  known  as  convergent  rounding  which 
tney  assert  forces  the  roundoff  error  to  approach  zero. 
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The  AP-120B  aoes  not  contain  the  normal   bus   structure 

[|  of   other  array  processors  but  insteaa  uses  dedicated  38-bit 

data  paths  tor  the  movement  of  data.   There   are   two   paths 

available   to   the   adder  (one  for  each  input  register)/  two 

'f  oaths  to  the  multiplier  anc  three   paths   available   to   the 

memory   and   data   pads.   This  allows  seven  independent  data 

woras  to  oe  transferred  each  cycle.   (This  coupled   with   an 

aadf   multiply   and   adaress  i nc remen t /dec remen t ^  equals  the 

ten  instructions  oer  cycle  possible.)   These   separate   data 

J  patns   eliminate   the   neec   for   a   handshaking  arrangement 

between  logic  elements^   although   hankshaking   is   reauired 

when  the  AP-120H  communicates  with  the  host  (7/3o). 


The  price  o ■^  a  unit  which  includes  the  AP-1208  array 
processor/  interface  <v  i  t  h  the  PDP-11/  IbK  words  of 
533-nanosecond  interleaved  I'^'OS  memory/  expansion  chassis/ 
installation,  i25o  words  of  program  source  memory/  51(^  words 
of  H^ead  Only  Memory  (POM)  taole  memory/  a  lin<er/  loader/ 
simulator/  debugger/  algorithrn  library  and  executive  is 
$SO/<?70.00  110].  This  mcluoes  a  90-day  warranty  with  a 
servicina  agreement  availacle  at  extra  cost.  The  field  test 
mean  time  between  failure  is  3500  hours  [31. 

The  following  section  explains  the  hardware  of  the 
AP-IPOB  in  detai 1 . 


A.   CHAKACThRISnCS  Ar^D  HARDWARE 


3a 


I,   Multiplier 


The  Multiplier  unit  (fig  6)  consists  of  two  38-Dit 
multiplier  registers  ^^1  and  M2,  three  multiplication  stages 
and  a  38-bit  register  to  store  the  result  (KM),  To  receive 
a  resultant  after  initiating  the  multiolv/  three  cycles  or 
500-nanosecond5  are  reauirea.  Inputs  to  the  Ml  register  can 
come  from  Data  Pad  x  (OPX),  Data  Pad  Y  (DPY),  Table  ^^emory 
(TM)  or  the  Multiplier  result  register  (FM).  Inputs  to  M^ 
are  either  from  DP\ ,  OP  i ,  Aader  result  register  (FA)  or  Main 
Data  Memory  Output  Buffer  (^D),  Results  from  the  multiplier 
can  go  to  ^'il/  the  Adder  incut  recister  (Al),  ''lain  Data 
I   Memory  input  buffer  (-'1),  DPX  or  DPY. 


Stage  one  of  the  multiplier  starts  the  product  of 
fractions  ny  beginning  the  multiplication  of  the  two  28-bit 
mantissas.  This  multiolication  is  completed  in  stage  two 
resulting  in  a  Sb-Dit  mantissa.  Stage  three  aads  the 
exDonents  as  it  normalizes  and  convergently  rounds  the 
5tD-bit  mantissa  to  2^-bits.  This  stage  also  detects 
exponent  overflow/underflow  and  if  either  exist  will  set  the 
FO  of  Fij  bit  in  the  status  reaister.  The  status  register 
can  be  read  by  the  program  to  determine;  if  conditions  are 
met  from  an  arithmetic  operation^  to  specify  errors^  or  to 
be  used  in  branching  logic.  These  bits  are  available  for 
testing  one  cycle  after  completion  of  the  multiply. 


This  three  stage  multiply  allows   pipelining   to   be 
used   since  each  staoe  is  independent  of  the  other  two  which 
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permits  a  multiplication  result  to  be  present  at  the  result 
register  every  1 67-nanoseconds  once  the  pipeline  becomes 
full  (three  cycles  reauired  to  fill).  Note  that 
500-nanoseccnds  are  recuired  if  the  result  of  the 
multiplication  is  requirea  in  the  next  multiplication  as  is 
the  case  with  scalar  aritnmetic. 

A  readily  apparent  problem  with  the  multiolier  is 
that  Ml  receives  inputs  from  both  the  Table  i^emory  (TN*)  ana 
the  Multiplier  Result  register  (FW)  while  ^-^2  receives  inputs 
from  neither.  Therefore/  if  a  constant  from  TN^  were  to  be 
multiplied  by  the  result  of  s  just-completed  multiplication, 
it  would  require  an  extra  two  cycles  since  either  F^i  or  TM 
would  first  have  to  be  written  into  DP <  or  DPY  and  then 
written  into  ^2 .  This  disadvantage  is  overshadoweo  by  the 
fact  that  even  though  dedicated  data  lines  cause  the  above 
problem/  in  most  cases  they  present  a  distinct  advantage  by 
allowing  multiple  data  transfers  in  any  given  cycle  [321. 

2  .   Adde  r 


The  operation  of  the  adoer  (fio  73  is  similiar  to 
that  of  the  multiolier  and  consists  of  two  36-bit  adaer 
registers  Al  and  A2/  two  adder  stages  and  an  adder  result 
register  (FA).  The  addition  of  two  numbers  requires 
333-nanoseconds  (two  cycles).  Inputs  to  Al  are  from  Table 
'Memory  (T'^),  Multiplier  Output  register  (F'^M/  Data  Pad  X 
(DPA),  Data  Pad  i  (DPY)  and  the  ZERO  constant  while  inputs 
to   A2  are       from  the  Adder  Output  register  (FA),  Data  Pad  X 
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(DPX),  Data  Pad  V  (DRY)  ana  the  ZERO  constant.  The  results 
from  the  adder  can  qo  to  ^^,  ^2,  DPX,  DRY  or  MI.  Stage  one 
aligns  the  mantissas  by  shifting  the  smaller  value/  based  on 
the  value  of  the  exponent/  to  the  right  until  both  exponents 
are  equal  then  adc:iinq  or  subtracting  these  mantissas.  Stage 
two  normalizes  ana  convergently  rounds  the  mantissa  and 
adjusts  the  exponent.  This  stage  also  sets  four  bits  in  the 
status  register  to  denote  results  egual  zero  (FZ)/  results 
less  than  zero  (FL)/  exponent  overflow  (FO)  or  exponent 
underflow  (FU).  These  oits  may  be  tested  by  other  proqram 
instructions  one  cycle  after  the  addition  is  completed, 
(^ote  that  FO  and  FU  are  the  same  bits  that  are  set  by  the 
multiplier  on  exponent  overflow  or  underflow.) 

As  with  the  multiplier/  the  two-staoe  aader  allows 
pipelining  ana  a  result  can  be  generated  esjer^ 
lb7-nanoseconds.  The  adder  does  not  have  the  disadvantage 
of  inputtina  Table  'Memory  (TNM  values  at  the  same  register 
as  FA  but  does  have  the  multiplier  result  FM  at  the  same 
adder  input  register  (A2)  as  TM  values.  There  is  therefore 
not  the  ability  to  immediately  add  a  F^,  value  with  a  TM 
value  without  first  going  through  DRX  or  DRY  [3<?]. 

For  both  the  adder  and  the  multiplier  there  would  be 
a  two  cycle  time  loss  if  Fi^  was  just  loaded  with  a  new  value 
from  the  multiplier  when  it  was  needed  for  the 
addition/nnultiplication  process  (time  U)  and  only  a  one 
Cycle  loss  if  it  was  ready  the  cycle  before  neeaea  (time  ^^  - 
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1  cycle).  Otherwise  there  would  be  no  loss  of  time  since 
steps  could  be  taKen  to  move  the  value  in  FM  through  the  DPX 
or  DPY  which  would  make  it  be  available  at  the 
adder/multiplier  input  register  when  necessary. 
(PresuDposing  of  course  that  the  data  paths  to  or  from 
memory  were  not  needed  for  other  uses.) 

3.   S-Pad 

The  5-Pad  (fig  8)  (pseudonym  for  scratch  oad) 
consists  of  the  S-Pad  Memory,  S-Pad  Arithmetic  Logical  Unit 
(ALU)/  Data  Pad  Aadress  Register  (DPA),  N'emory  Address 
Register  (MA)  and  the  lable  Morpory  Address  Register(TMA). 
The  sole  purpose  of  the  S-Pad  is  to  compute  aodresses  for 
Table  Memory/  Main  Data  '^emory  and  the  Data  Pads.  The  S-Pad 
can  operate  concurrently  with  the  memories.  Multiplier  and 
Adder  [71  . 

The  S-Pad  Memory  is  made  ud  of  lo  registers  each  lb 
bits  wide  giving  the  ability  to  compute  an  effective  address 
of  o^K.  These  registers  may  be  assigned  label  names  like 
"pointer"  Dy  the  use  of  cseudo-ooerators/  to  maKe  crograms 
more  readable,  or  mav  be  oirectly  addressed  by  number. 

The  S-Pad  Arithmetic  Logical  Unit  forms  the  operand 
addresses  and  also  automatically  looo  counts,  shifts  the 
aodresses  left  once  (divide  by  two),  shifts  the  addresses 
right  once  (multiply  by  two)  or  right  twice  (multiply  by 
four).   There  is  also   the   aoility,   if   reauired,   of   bit 
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reversal/  to  swao  bits  while  accessing  data  in  a  scrambled 
order  after  a  Fast  Fourier  Transform.  The  results  of  the 
S-Pad  arithmetic  logic  unit/  called  SPFM/  set  bits  in  the 
status  register  to  indicate  whether  the  results  were  less 
than  zero  (f^O/  zero  fZ)  or  if  there  was  a  carry  bit  (C). 
These  bits  are  available  for  testing  by  orogram  instructions 
at  the  next  instruction  cycle. 

Tf^A/  DPA  and  MA  store  the  computed  address  from  the 
S-Pad  ALU.  The  contents  of  each  can  either  be  changed  by 
the  value  of  SPFN  or  incremented  by  one.  One  cycle  is 
reauired  to  conoute  the  address  and  load  it  into  the  oroper 
register  [3<£1  . 

^,   Table  Memory 

Table  memory  is  a  512  word/  38-bits  per  word  bipolar 
read-only  memory  used  to  store  important  and  much  used 
constants.  This  memory  has  a  lb7-nanosecona  cycle  time  out 
reguires  two  cvcles  to  get  the  value  from  memory  to  the 
output  register  T'^i  [7],  values  in  [M  are  available  for  use 
by  DPX/  OPY/  .MD/  f^U  and  Al.  These  values  may  be  reguestea 
every  macnirie  cycle  and  are  initiated  by  changing  the 
contents  of  the  Table  Memory  Address  Register  (TMA)  in  the 
S-Pad.  The  orogrammer  must  control  the  timing  necessary  to 
insure  the  correct  constant  is  at  T:"*^  when  needed  due  to  the 
2  cycle  access  time  reauirement. 
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In  the  Fast  Fourier  Transform  f^ode/  the  address  in 
TMA  is  interpretted  by  the  hardware  to  be  the  angle  which 
ooints  to  the  appropriate  root  of  unity  for  a  particular 
step  in  the  FFT  aloarithnn.  Therefore^  in  a  single  auaarant 
of  cosines^  a  full  table  can  be  represented  f32]. 

There  is  an  ootional  .^andom  Access  Table  l^emory 
(TMRAM)  containing  IK  of  random  access  meinory  (81.  This 
allows  loading  of  soecial  constants  necessary  for  special 
applications  without  the  overhead  of  computing  them  every 
time  or  usina  valuable  data  pad  soace  to  store  them.  The 
price  of  this  option  is  aoproximately  S 1850, 0  0  17). 

5.   Data  Pad  X  ana  Y 

The  Data  Pads  (fig  ^)  consist  of  sixty  four  38- bit 
accumu 1  a t o r s <  four  of  which  are  available  from  the  lb 
addressable  each  instruction  cycle  [7].  Tnese  t)4 
accumulators  are  dividea  into  two  32-register  blocks  called 
Data  Pad  X  (DPX)  and  Data  Pad  Y  fDPY).  From  each  Data  Paa, 
one  reaister  can  be  read  ana  another  written  aurina  the  same 
cycle. 

Tne  restrictions  are  that  the  same  reoister  cannot 
be  read  and  written  simultaneously  and  that  a  read  and  write 
operation  during  the  same  cycle  must  occur  on  registers 
whose  addresses  differ  by  no  more  than  7  aue  to  base- 
address-plus-offset  addressing.  (However  a  register  in  DPX 
may  be  written  at  the  same  time  as  a  register  in  DPY  even  if 
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they  both  have  the  same  address.)  In  the  S-Pad,  the  Data  Pad 
Address  Kegister  (DPA)  sucplies  the  base  address  to  be  used 
by  the  read/write  instruction  to  locate  the  orooer  Data  Pao 
register.  The  DPA  supplies  both  DPX  and  DPY  concurrently. 
The  instruction  uses  this  base  address  and  an  offset  in  the 
form  DPX(offset)  or  DPY (offset)  and  can  address  -a  to  +3 
offset  from  the  base  in  each  Data  Pad  to  find  the  effective 
address.  Therefore  if  the  DPA  contains  decimal  value  2.0, 
reaisters  Ifo^  17,  18,  19,  2u,  21,  2  2.  and  23  can  be  addressed 
in  eacn  data  cad.  The  reaister  addresses  of  both  Data  Pads 
range  from  0  to  37  (base  b)  and  are  arranged  in  a  circular 
addressing  scheme.  Therefore  37  (base  8)  +1  =  0  and  the 
programrner  need  not  be  concerned  about  writing  into  a  non- 
existent location  but  must  only  be  concerned  with 
overwriting  previously  written  information. 

DPX  and  DPY  receive  information  from  MD,  FA,  FM, 
DPX,  DPY,  output  o^  the  S-Pad  arithmetic  logical  unit  (SPFN) 
and  VALUE  (an  immediate  value  used  Oy  immediate  instructions 
arriving  from  the  command  buffer),  DPX  and  DPY  suooly 
values  to  Ml,  M2,  Al,  A2,  CPX,  DPY  and  MI  [32J. 

6.   Viain  Data  •'^emory 

Main  Data  Memory  (fig  10)  contains  bUK  38-bit  words 
used  primarily  to  store  inputted  data  which  will  be  operated 
on  Dy  the  program.  This  memory  is  available  in  two  forms, 
lb7-nanosecond  hardware  interleaved  MOS  with  -^K  wora 
segments  or  333-nanosecond  hardware  interleaved  mqS  with   8K 
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wora  segments.  Both  memories  have  a  two  bit  parity  option 
available  [7]  ana  a  one  meqaword  oaae  selection  option  [9], 
With  memory  limited  to  o4K,  the  largest  comp 1 e x - t o-c omp 1 e x 
Fast  Fourier  Transform  possible  is  32K/  which  may  not  be 
acceptable  in  some  applications. 

Main  Data  '^emory  receives  input  information  into  its 

Memory  Input  Buffer  ( '"'i  I  )  from  FA,  FM,  MO/  DPX,  DRY,  T  N! ,  SPFN 

and  VALUE.  It  can  output  via  the  '^'emory  Data  duffer  to  DRX, 
DRY/  A2  and  ^'2, 

Memory  read  or  write  may  be  requested  every  other 
cycle  by  chanaino  the  value  of  the  Memory  An cress  Register 
(NiA)  in  tne  S-Pad.  This  yields  an  effective  memiory  cycle 
time  of  either  333-nanoseconas  (lG7-nanoseconas  plus  one 
machine  cycle)  or  50 0-nanoseconds  (333  plus  one  machine 
cycle)  dependent  on  the  tyne  of  memory  installeo  [3^].  By 
special  programming  tecnnicues  and  procer  chip  procurement/ 
this  overhead  can  Pe  reduced  to  the  aavertisea  memory  speed 
witn  the  restrictions  that  the  memory  alternate  between 
chips  or  alternate  between  even  ana  odd  boundaries.  If 
effective  speed  is  essential/  it  oecomes  the  programmers 
responsibility  to  insure  data  location  is  known  to  the 
program  at  all  times[81.  A  read  reauires  three  cycles  for 
information  to  be  present  in  the  ^D  if  using  3 33-nanosec ond 
memory  and  two  cycles  if  using  1 b 7 -n ano sec ond  memory.  This 
information  will  be  available  until  a  new  value  overwrites 
it.   If  a  write  or   read   is   initiated   before   two   memory 
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cycles  (unless  scecial  chics  and  techniaues  of  above  are 
used)^  the  reauest  will  not  be  lost  but  the  memory  will 
automatically  provide  a  haraware  lockout  (wait  until  memory 
available  for  read/ write)  [l^^]. 

The  value  in  the  Memory  Address  Register  (MA)  points 
to  the  desired  location  in  main  data  memory,  N'A  may  be 
either  set  to  a  specific  value  or  incre mi ented/ decremented  by 
one  in  tne  S-Pad.  Since  there  is  a  slight  time  lag  cetween 
when  a  value  is  requested  to  be  placed  in  N'D  ana  when  it 
actually  aets  there^  the  crogrammer  must  always  be  aware  of 
what  values  are  in  MI  and  ^D ,  to  allow  the  proper*  "set  up" 
time  to  get  these  values  to  either  the  Aader^  Multiplier  or 
correct  DPX,  DRY  or  mi  address  [321  . 

7.   Proaram  Source  Module 


Tne  Proorarn  Source  Module  (fig  11)  consists  of  the 
Program  Source  '^■emory  (PS);  Proaram  Source  Address  Register 
(PSA),  Control  duffer  (CS)  ana  the  Subroutine  Return  3tac< 
fSRb)  [12]  . 


The  PS  is  a  nigh  soeed,  50-nanosecond,  bipolar 
memory  aadressaple  to  7K  b^-oit  words  ana  is  available  in 
?56  wora  increments  [^] .  The  PSA  contains  the  address  of 
the  next  instrtjction  ana  is  incremented  by  one  after 
instruction  execution  unless  modified  by  either  the  Control 
Buffer  (new  aadress  as  a  result  of  a  branch  or  jumo 
instruction)  or  the  Subroutine  Return  Stack.   The  SRS   saves 
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the  current  PSA  when  a  Jumo  Subroutine  instruction  is 
performeci  and  increments  the  value  of  the  Subroutine  Return 
Address  (SRA).  \-inen  a  Return  instruction  is  oerformedf  the 
SKA  is  decre-nented  by  one  rnaking  nested  subroutines 
possible.  The  Control  Buffer  decodes  and  executes  the 
instruction   as   the  CPU  would  in  a  general  ouroose  computer 

tiaj . 

8.   Interface  with  PQP-ll  Series 


The  interface  unit  with  the  PDP-11  series  contains 
two  major  segments/  the  Front  Panel  and  the  ON' A  Controller 
and  Formatter.  The  Front  Panel  contains  three  registers  and 
is  used  mainly  as  a  debucgina  aid  while  the  0!^A  Controller 
and  Formatter  contains  five  registers  and  is  used  for 
program  and  data  entry  or  removal. 

a.   Front  Panel 

The  Front  Panel  (fig  12)  consists  of  three 
l6-oit  registers/  the  Switch  Register  (S;.'R3/  tne  Liahts 
Register  (LITES)  and  the  Function  Register  (Fi\).  Tne  Front 
Panel  is  used  for  boo t s t race i ng  and  debugging  of  user 
programs.  Ihese  three  registers  can  be  examined  oy  the  host 
and  tatce  the  place  of  the  toggle  Sv«itches  normally  on  the 
front  panel  of  the  console  [32],  i/^ith  the  use  of  the 
Debugger  proaram/  these  registers  can  effectively  breakpoint 
the  AP-120B  at  a  selected  program  location  or  data  aadress. 
This   Front   Panel   allows  each  program  to  be  single-stepped 
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through  its  execution  sequence  [6/7], 

The  Switch  Register  is  written  by  the  host 
computer  but  can  be  read  by  both  the  AP-120B  or  the  host. 
The  Si/gR  is  used  to  enter  data  and  addresses  into  the 
AP-12uB/  primarily  for  debugging.  Its  contents  can  be  fed 
to  the  OPX,  DPY,  MO  or  the  S-Pad. 

The  Lights  Register  siniulates  the  front  panel 
lights  of  the  console.  This  reoister  is  set  by  the  AP-li^Ori 
and  can  only  be  reaa  ov  the  host.  LITES  is  used  to  display 
selected  contents  of  the  internal  registers  of  the  AP-1208. 

The  final  register  is  the  Function  Reqister 
which  provides  front  canel  toggle-li<e  controls  to  the 
AP-1208.  The  Fi^  can  stop/  start,  step  or  reset  the  AP-120B. 
It  can  also  continue  operation  resumino  at  the  current  value 
of  the  PSA,  examine  a  register,  examine  a  portion  of  a 
register  or  memory  contents  of  a  selected  area  >  deposit  the 
contents  of  Si'^R  into  a  selected  reaister  or  memory  location 
ana  then  breai<.point  according  to  the  values  of  TMA,  "'lA  or 
DPA.  The  F;J  can  also  increment  the  T'-iA,  WA  or  DP  A  after 
completion  of  an  instruction  to  facilitate  stepping  throuoh 
memory  locations  [3<21. 

The  Front  Panel  is  advertised  to  be  invaluable 
in  troubleshooting  when  used  in  conjunction  with  the 
interactive  Debuager  routine. 
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b.   DMA  Control 

The  DMA  Control  is  the  second  half  of  the 
interface  ano  consists  of  three  lb-bit  registers^  one  16-bit 
register  and  one  38-bit  register.  DMA  Control  is 
responsible  for  transferring  orograms  and  data  between  the 
AP-120B  and  the  host  connputer.  This  section  of  the  Front 
Panel  will  also  do  forfrat  conversion  "on  the  fly"  which 
shoula  effectively  alleviate  time  lags  [321.  Four  types  of 
data  transfer  compinations  are  possible^  host  D"^A  to  AP-120B 
DMA/  host  DMA  to  AP-120b  Prograrpmea  I/O,  host  Programmed  I/O 
to  AP-l^OB  Proarammed  I/O  and  host  Pro  a  rammed  I/O  to  AP-lc'OB 
DMA  with  a  maximum  theoretical  ourst  transfer  rate  of  three 
meaa words  oer  second  for  all  tyoes  of  transfers  [71. 

The  Format  Register  (FMT)  is  a  3P-bit  double- 
buffered  register  used  to  perform  all  transfers  of 
f  1  oa  t  i  ng-Qo  i  n  t  numbers  from  the  host  to  the  AP-IPOB  [1>2]  . 
The  FMl  will  convert  16-bit  integer  numbers  to  3b-bit 
unnormalized  f 1 oa t i na-po i n t  numbers,  32-bit  PDP-11  integers 
to  32-bit  AP-12  0B  inteaers  and  32-bit  floating-point  numbers 
to  38-bit  floating-point  numbers.  Al]  these  ooerations  are 
in  reverse  for  the  AP-120b  to  host  direction  L7J.  Since  the 
POP-11  is  a  16-bit  computer,  it  will  access  the  Formatter  in 
lo-bit  half-words  to  be  compatible.  It  must  be  notea  that 
for  some  applications,  such  as  difference  filtering,  there 
is  a  possiblity  of  extreme  accuracy  loss  due  to  l6-bit 
integer  to  38-bit  floating-point  conversion.   The   synthetic 
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precision  aenerated  by  such  a  conversion  can  cause  certain 
coeffiecient  comb i na t i ons /  such  as  +1  and  -1^  when 
multiplied  by  mirrored  arrays^  to  result  in  errors  when 
reconverted  to  lb-bit  format.  The  program, Ter  must  be  aware 
of  these  oossible  losses  and  test  for  them,  before  faith  is 
placed  in  tne  result. 

The  AP  Direct  f^emory  Aadress  Reaister  (APDMA) 
points  to  consecutive  locations  in  AP-120B  Main  Data  Memory 
during  DMA  transfers.  This  register  can  be  automatically 
incremented/decremented  allowing  bloci<s  of  information  to  be 
read  into  consecutive  locations  with  minimal  overhead. 

The  Host  N-emory  Access  Register  (H.VAJ  operates 
similiar  to  the  APDMA  except  it  ooints  to  consecutive  memory 
locations  in  the  host  memory.  In  the  PDP-11  this  memory  is 
256K  so  the  HMA  is  Itt-bits  to  allow  for  this  aadressing 
cacab  i  1  i t  y . 

The  '^^ord  Count  Register  ( .\C)  counts  the  numoer 
of  words  transferred  during  a  DMA  ooeration.  This  register 
must  be  preset  to  the  reauired  number  of  words  ana  will  stoo 
DMA  transfer  when  the  prescribed  number  of  words  is 
transferred. 

The  final  and  most  inportant  register  in  the 
interface  is  the  Control  Register  (CTL).  It  controls  the 
direction  and  mode  of  transfer,  type  of  format  conversion 
and  provides  certain  status  bits  oertaining  to  the  transfer. 
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This  register/  with  the  use  of  HMA  and/or  APDMA,  allows  the 
host  to  execute  other  programs  and  be  interrupted  when  the 
DN'A  is  comoletea.  This  CTL  also  allows  either  the  host  or 
AP-120B  to  control  the  data  transfer.  (The  AP-120B  must 
control  transfer  from  a  loaded  proaram  since  the  executive 
alone  is  not  powerful  enough  to  control  data  transfer  [12]   .) 

B.   SOFT^^JARL 

Various   software   supcort/   executive   and   aevelopment 
programs  are    available  with  the  AP-120R. 

1.   Executive  and  Associated  Routines 

The  AP-lc^OB  orovides  executive  and  housekeeping 
routines  to  increase  the  effectiveness  of  operation  ana 
enhance  program  develooment. 

a  .   A  P  M  A  T  h 

APMATH  is  a  series  of  acproximately  150  [8] 
library  functions^  vector  and  matrix  subroutines  and  sional 
processing  algorithms  [7]  written  in  AP-iPOB  assembly 
language  [Ml.  These  routines  are  callable  from  either  host 
Fortran/  host  Assembly  or  AP  assembly  lanauages  [36]  with 
the  use  of  the  AP  Executive.  These  programs  can  reduce  the 
run  time  and  decrease  programming  time  by  presenting  some  of 
the  most  common  array  processing  functions  in  subroutine 
callaole  form.  These  routines  include:  data  transfer  ana 
control?  basic  vector  arithmetic?  matrix  operations  and  Fast 
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Fourier  Transform;  all  of  whicH  are  able  to  work   with   both 
real  and  complex  data. 

b.  APEX 

APEX  is  the  AP  Executive  routine  which  is 
resident  in  the  host  ccn outer  and  allows  the  AP-120B  to 
communicate  with  the  host  comouter  via  Fortran  or  host 
Assemoly  language  calls.  APEX  decodes  subroutine  calls  from 
the  host  comouter  [36]  and  directs  the  AP-l?OB  to  perform 
the  specified  action.  Both  APN'ATKt  routines  and  user  written 
routines  may  De  called  by  the  AP-120B  from  the  host  computer 
[3?]  . 

c.  APAL 

The  Ap  Cross  Assembler  (APAL)  is  a  two  pass 
assempler  written  in  Fortran  IV  which  reouires  d.^\s  memory  in 
the  host  computer  to  operate.  APAL  assembles  source  text 
written  in  aP  Assembly  lanouage  into  coject  code 
understandaole  by  the  AP-120B.  The  assembler  also 
optionally  oroduces  an  AP  Assemoly  listing  containing  errors 
in  Doth  passes^  location  counters,  assembled  data/  the 
symDol  taole  and  source  statements. 

APAL  recognizes  signed  constants  ranging  from 
-32768  to  32767  and  unsigned  constants  from  0  to  65535  both 
of  which  may  oe  reoresented  in  binarv/  octal  (default  base)»' 
decimal  or  hexadecimal.  It  allows  free  formatting  but 
recognizes  the   general   source   stat-ement   form:    optional 
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label  followed  by  a  colon,  multiole  op  codes  separated  by 
semicolons  (one  to  ten  operations  which  total  no  more  than 
64-0  its.  Sixty  four-bits  is  the  maximum  dictated  by  seven 
data  transfers/  one  ado,  one  multiply  and  one  address 
increment/decrement)/  and  an  optional  comment  statement 
denoted  with  leading  double  quote  ("). 

Once  the  modules  are  written/  APAL  can  be 
operated  dynamically/  allowing  the  proarammer  to  build  the 
program  at  assembly  time.  APAL  will  question  the  operator 
about  the  source  file  name,  destination  file  name  etc.  ana 
subsecjuently  will  prompt  him  concerninq  m.  issing  items.  If 
there  are  errors  in  the  module,  these  can  be  changed 
dynamically  without  reassembling  the  entire  module  [4]. 

d.   A  PL  INK 

The  AP  linker  (APLINK)  is  written  in  Fortran  IV 
and  requires  apo ro x i ma t e 1 y  lOK  of  memory  in  the  host 
computer.  APLINK  performs  functions  similiar  to  those  of 
any  other  Iiok  editor  which  inclune  relocation  and  assigning 
absolute  adaresses  to  the  oDject  module,  correlation  of 
qloDal  entry  symools  in  one  module  with  external  symbols  in 
the  other  moaules/  loadinc  the  module  from  the  program 
library  ana  production  of  the  final  load  module.  These 
functions  are  performed  interactively  with  dialogue  between 
APLIMK  and  the  user  at  the  console. 
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Besides  linkinc  the  modules/  APLINK  returns  to 
the  console  any  symbols  in  a  file  which  are  undefined/  will 
output  the  symbol  table  anc  locations  when  requested  ana 
returns  the  high  address  and  starting  address  to  be  used 
witn  the  Deougger  routine  [51 . 

e.  APSLM 

APSIM  is  the  AP-l,?Ob  simulator  and  is  designee 
to  be  used  when  aevelooinq  oro grams  when  use  of  the  AP-l^OB 
is  imoractical  or  imoossible  due  to  rrociuction  schedules. 
APSi"'^  emulates  all  haraware  and  timing  characteristics  of 
the  AP-120R  as  well  as  performing  the  mathematical  routines 
as  closely  as  possible  to  the  way  the  AP-120B  woul(3  perform 
them  [3^1.  APSIM  requires  32K  words  of  memory  in  the  host 
computer  [I] . 

f,  APDERUG 

APDEBUG  is  the  AP-120B  interactive  aebugger 
program  to  be  used  for  dynamic  debugging  of  AP-l^OB 
aoplications  programs  at  run  time.  Changes  can  be  m. aae  when 
the  proolem  is  identified  and  APDtBUG  will  call  the  APLINK 
and  APAL  routines  to  insert  the  new  object  module  then 
continue  with  orooram  develooment.  APDEPUG  can  work  in 
conjunction  with  the  simulator  or  the  actual  hardware  [o]  . 


g.   Testing  Software 
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There  are  three  software  modules  available  to 
completely  test  the  AP-120fi  hardware  coerations. 

APTEST  is  the  AP-120B  path  tester.  This 
software  exercises  the  panels  DMA  interface^  internal 
registers  ana  memory  to  check  for  proper  operation. 

APPATH  tests  the  internal  data  paths  of  the 
AP-1208  and  returns  diagnostics  upon  finding  any  errors. 

Forward/Inverse  Fast  Fourier  Transform  Test 
(FIPFT)  verifies  correct  ooeraticn  of  the  AP-120ri*s 
arithmetic  units  by  performing  Fast  Fourier  Transforms  and 
inverses  them  comparing  results  with  stanaard  answers  132]  . 

These  packages  can  Pe  used  to  help  insure  proper 
operation  of  the  AP-120B  before  development  or  actual 
operation  and  also  help  with  the  hardware  fault  locating 
effort  during  system  maintenance. 

(?  .   P  rog  r  amm  i  ng  L  an  q  u  a  c  e 

The  '^ath  Library  of  AP  functions  can  oe  called  by 
the  host  Assemoly  Language^  Fortran  or  the  AP  Assemoly 
Language  (3d1.  However  to  write  a  custom  library  function/ 
AP  Assembly  Language  must  be  used  and  the  cross-assembler 
will  translate  it  into  an  executable  routine. 


Investigating   the   programming   language    is    not 
important   here   except   to   say   that   it   is   similiar   in 
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characteristics  to  other  assembly  languages.  There  are 
sufficient  commands  available  to  write  a  program  to  properly 
control  AP-120B  execution  in  an  efficient  manner.  Bit 
testing^  conaitional  branching/  flaa  settina  and  arithmetic 
instructions  all  are  part  of  the  instruction  repertoire 
which  allows  varied  aoolications  programs  to  be  written. 

3,   Page  Select  Option 


The  AP-12UB  can  alternatively  be  eauipped  with  a 
Page  Select  Ootion.  This  orovioes  the  aoility  to  aaaress 
one  megawora  of  main  memory  in  the  AP-120B  by  using  host 
main  memory  and  virtual  memory  techniques.  tach  page  can  be 
up  to  64 K  woras  long  (full  Main  Data  i-^emory  size  but  each 
page  must  be  at  least  8K)  and  lb  pages  are  available.  The 
Page  Select  Option  increases  the  abilit-y  for  the  AP-130b  to 
work  on  larger  transforrrSf  but  due  to  paging  overheads  it 
may  not  increase  the  throughout  rate  due  to  increased  host 
i  n vo 1 V  emen  t . 

This  option  modifies  the  AP  Direct  "^emory  Agdress 
Register  (APOI^A)  located  in  the  DMA  Control  section  of  the 
interface  by  extending  it  from  16  to  20  bits  therefore  2**20 
addressing  caoability  (approximately  one  megaword).  This 
virtual  memory  ability  is  called  the  AP  Memory  Address 
Extension  (APMAE)  and  new  addresses  can  only  be  loaded  by 
the  host.  Since  tne  host  will  control  all  oaging 
operations/  the  AP-120B  commands  will  not  change  inasmuch  as 
it  will  only  recognize  6Uk  viorci     locations  i"^]  . 
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^.   P roq rammab 1 e  I/O  Procesor 

The  Programmable  I/O  Processor  (PIOP)  is  a  micro- 
codable  micro-processor  wnich  acts  like  a  high  speed  channel 
program  control lina  an  inout/output  port.  It  is  capable  of 
transferring  aata  at  a  six  megahertz  burst  rate  or  at  a 
three  megahertz  sustainea  operation  rate  (assuming  167 
nanosecond  Main  Data  Memory).  The  PIOP  can  be  usea  with  up 
to  eight  external  aevices  (like  A/0  converters  or  mass 
storage  devices)  thereby  acting  as  an  I/O  bus  controller. 

J r\e  PiOP  interfaces  directly  with  the  D'VA  Controller 
in  the  interface  unit.  It  has  a  38-bit  instruction  word/  a 
20-bit  arithmetic  logical  unit  and  is  caoabale  of  addressing 
to  one  megaword  of  memory  making  it  compatiole  with  the  Page 
Select  Option.  Communication  with  the  AP-I20B  is 
accomolisned  via  one  of  eiaht  flags  and  four  interruots. 
The  micro  code  suoports  subroutines  ana  has  the  logic  to 
oerform  jumos  witnin  its  own  code. 

The  PIOP  must  hancle  all  handshaking  ana  timing 
considerations  with  notn  the  external  devices  and  the  host 
program  to  insure  data  integrity.  This  can  be  complicated  at 
times  so  a  Proqrammable  I/O  Channel  fPIOCJ  is  also  available 
which  decreases  flexibility  but  eases  the  programming  buraen 
[33]  . 

Neither  tne  PIOP  nor  PIOC  orovides  a  method  of 
connecting   two   AP-1206's   together   in  series  without  host 
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intervention  which  tends   to   limit   scne   of   the   possible 
applications  of  the  AP-l?oe, 

C.   PKOGRAMMIi\G/  OPERATION  AND  EXhCUTIOM 

The  AP-lj^Ob  can  utilize  the  parallel  creratiori 
capability  of  the  adaer,  Tultiplier  ano  aata  transfers  to 
increase  execiJtion  of  tho  oroaram  and  throuahput  on  larae 
data  arrays.  These  parallel  operations  fnust  oe  controllea 
so  that  oDtiTum  execution  speea  can  oe  realized  without 
causina  interloci<  or  lockout.  LocS^out  cou^d  eve^^tually  leaa 
to  a  oroaraTi  s^oppaae  (11.  Since  rnost  scientific  aata  can 
best  be  struct  urea  into  an  array  forr,  t^e  array  processor 
is  able  to  work  on  it-  auici<^lv  ano  efficiently  in  its  natural 
state  where  a  genera!  purpose  ccnouter  '"ust/  in  r^ost  caseSf 
restructure  it  [3bl. 

Before  the  Ap-l2  0b  can  «ori<  on  aata;  the  aata  -"ust  first 
be  transferrea  troT  its  TieTory  locations  in  tne  host  to  ''^ain 
Oata  ^-lerr.  ory  in  the  a  r  r  a\/  processor  (or  -povea  to  N'^ain  Data 
Memory  trcm  an  external  oevice  via  the  PIuP.  That  situation 
will  not  oe  aealt  with  here  since  the  °IlJP  is  programmable 
and  therefore  oatn  ana  data  options  associate o  with  it  are 
manv,).  The  data  is  transferrea  via  the  interface  with  the 
use  of  the  APPUTCriOST,  AP,N,TYPt)  command  (Put  Oata  into  t^e 
AP-12ue).  As  with  arnuments  of  other  AP-i20B  CALL 
statements/  HOST  AP ,  N  and  Tt'Pt  neea  not  oe  explicitly 
stated  Put  can  be  expressions/  integers  or  variables. 
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The  host  ana  AP-12  0B  must  be  svnchronized  in  their 
operations  so  comoutations  can  not  go  on  while  aata  is  still 
being  transferred  to  memory.  APA'D  (wait  on  Data)  causes  the 
host  to  wait  until  aata  transfer  is  completed  before  it 
resumes  executing  the  Dronram,  AP/vP  ('/.ait  on  Running) 
causes  the  host  to  wait  until  the  AP-12nH  is  ccT^pleted  with 
one  command  before  another  is  sent  over.  APv'jAIT  is  a 
combination  of  AP/,D  and  APaP.  One  difficulty  encountered 
using  these  cooimands  is  that  the  host  to  monitor  the 
orogress  of  the  execution  if  collina  is  usecJ  to  .Determine 
APl'.u,  ApiA'P  or  AP/^jATT  comoietion  or  the  A  P  -  1  c?  0  h  must  wait  if 
oriority  interrupts  are  used/  which  increases  the  time 
necessary  to  comolete  the  program. 


Sonie  of  tne  overheaa  of  the  host  can   be  eliminated   by 

not   using   the   AP   .'J  ait  on  ^^unning  (Api^jR),  AP  ^lait  on  Data 

(APaD)  or  AP  /J ait  (AP»vAITj   commands.    This  tecnniaue   may 

soeed   uo   crogram  execution  an-j    should  only  be  used  when  it 

is  absolutely  necessary  anc  wnen  there  is  no  chance  tnat  the 

results   will   ce  orocessec  before  thev  are    actually  present 

ij 
in  the  AP-l<^OB  N'^ain  Data   ^^emory.    Floating   Point   Systems 

sugaests  that  the  orogram  first  re  written  ana  executed  with 

the  AP'aP,  AP/jD  and  AP/^AIT  commands  oresent  ana   the   results 

aotten.    Then   rem,  oving   a   few   of  tiicse  instructions  at  a 

time>  the  results  can  be  checked  to  see  if   they   match   the 

original  results.   This  only  works  for  specific  applications 

ana  does  not  conform  to  modern  programming  practices.   It  is 

also   extremely   dangerous  since  it  does  not  allow  for  speed 
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j  fluctuations  Que    to  temperature  variations. 

When  Drocessina  is  complete/  the  data  can  be  transferred 
bacK  to  the  host  via  the  APGET()  command  which  operates  in 
the  same  manner  as  the  APPUT, 

The  application  program  resides  in  the  host  memory  and 
the  host  executes  this  proaram.  The  host  will  determine 
which  routines  must  be  passer)  to  the  AP-l^OB  and  if  the  data 
necessary  is  present  in  the  array  processor.  '^hen  a  routine 
is  called/  the  host  will  jump  r  o  it  and  execute  it  but  if 
the  routine  called  is  part  of  the  math  Horary  (whether  from 
APMATH  or  a  user  written  math  routine)/  the  host  first  jumos 
to  APEX.  APEx  then  loacs  the  6'4-bit  instructions  into  the 
AP-120B  Program  Source  Memory/  calculates  the  remaining 
space  available  in  the  Program  Source  ^'''emory/  upoates  the  ^S 
location  table/  loaas  the  parameters  ana  initiates  the 
execution.  If  the  same  routine  is  called  again  immediately/ 
it  will  not  be  reloaded  since  it  is  already  present  but  only 
the  new  parameters  will  be  loaded.  If  a  aifferent  routine 
is  called/  APEX  will  first  check  the  PS  location  table  to 
see  i  ■^  there  is  enough  unused  space  available  to  load  it 
without  aestroying  any  routines  currently  residing  in 
Program  Storage.  If  not  enough  soace  is  available/  the 
last-written  program  will  be  overwritten  with  the  newly 
called  routine  (Last  In  First  Out  (LIFO)). 

The  overhead  reguired  for  each  math  library  routine 
called   is   between   100  and  1000  microseconds.   Line  hundred 
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microseconds  is  the  minimurr  time  required  to  check  the  table 
and  move  parameters.  This  minimum  time  is  reauired  for 
every  call/  even  in  looping  operations.  During  this  periods 
the  host  must  de  available  to  the  AP-1<^0B  which  would  cause 
unnecessary  host  overhead.  While  the  AP-l^UB  is  e5<ecuting 
any  specific  routine*  the  host  can  be  freea  to  do  other 
tasKS  and  treat  the  AP-120B  as  a  peripheral  device.  The 
host  can  either  be  interrupted  or  can  use  polling  techniques 
to  oetermine  if  the  array  processor  requires  assistance.  In 
either  case/  the  programmer  must  be  aware  of  when  a  brea'< 
occurs  so  he  can  insure  that  the  proper  seauence  of  routines 
is  used  to  allow  the  host  to  perform  other  operations  and 
not  be  burdened  by  many  AP-12  0B  services. 

Several  ways  to  increase  availacle  free  time  in  the  host 
are  to  transfer  more  than  one  vector  with  each  APPUT  or 
APGET  command/  use  optimum  AP-120B  library  calls  to  perform 
given  operations  (it  is  the  programmers  responsibility  to 
determine  which  AP  routines  are  oest  for  each  situation)  and 
overlap  nost  ana  AP-120fi  operations  whenever  possible. 
Since  every  call  of  a  routine  reauires  nost  intervention/ 
several  routines  can  ne  comoined  into  one  by  writing  a 
special  macro  combining  those  routines/  which  will 
effectively  eliminate  some  host  overhead  bv  using  only  one 
"call"  statememt.  (^ut  these  macros  must  be  small  due  to 
limited  AP-l^Ob  program  memory.)  Since  host  overhead  varies 
betiween  luO  and  1000  microseconds/  with  the  higher  value 
being   oue   to   the   maximum   amount   of   data   and   proaram 
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transfer/  sonne  overhead  can  be  eliminated  by  loaainq  the 
rnost  used  routines  first/  since  overwrite  is  accornolished  by 
LIFO.  APEx  must  also  be  a  part  of  the  interrupt  priority 
scheme  of  the  host  (interrupt  or  polling);  therefore/  by 
having  the  AP-l^OB  at  a  high  priority*  the  overall  wait  time 
of  the  system  due  to  interrupt  »ait  can  be  minimized  18], 
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V.   MAP-300 


The  WAP-500  (Macro  Array  Processor)  (fig  13)  is  manufactured 
by  CSP  I nco roo ra t ed »  Burlinqton/  Massachusetts.  The  Dasic 
structure  consists  of  three  independent  busses^  an  executive 
rout  in  e^  t«vo  parallel  arithnnetic  units^  an  addresser  and  an 
input/output  hanaler,  each  having  its  own  clock  and 
operatina  in  a  parallel  asvchronous  fashion.  The  casic 
logic  units  are  the  Central  System  Processor  Unit  (CSPU)/ 
the  Arithmetic  Proceessor  (AP)  (consisting  of  the  Arithmetic 
Processing  Unit  (APU)  ana  the  Aaaresser  Processor  Section 
(APS))/  the  Host  Interface  Scroll  (hi  IS)  ana  an  optional 
Input/  Output  Scroll  (lOS).  All  except  the  CSPU  use  micro- 
coded  routines  stored  in  their  own  small  memories  and 
communicate  witn  each  other  via  flags  set  in  registers. 
(The  CSPU  stores  its  micro  cooed  routines  in  main  MAP 
memory.)  The  Host  Interface  Module  (HIN')  section  of  the  HIS/ 
the  lOS  ana  the  CSPU  are  built  around  a  stanoara  Intel  300<£ 
bit  slice  micro  processor. 

The  representation  of  MAP-30C  numbers  is  usually  a 
32-D  it  floating-point  format  with  a  one- bit  sign/  a  seven 
bit  exponent  (giving  a  ranae  of  lb  **  -^4  to  1 o  **  b3  biased 
by  6U  therefore  0  to  127  are  the  actual  numbers  storea)  and 
a  Z'-*  bit  mantissa  allowing  a  total  ranae  of  lu  **  -77  to  10 
**  76.  Sixteen-bit  floating-point  and  lo-bit  fixed-point 
numcers  are       also   available.    MAP-30  0   main    memory    is 
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addressable  in  either  32-bit  full-words  or  16-bit  half-words 
but  eight-bit  bytes  can  be  accessed  by  packing  pairs  into  a 
Ib-Dit  half-wora  (18}.  SNAP-II  commands  like  VFIXb  assume 
this  packing  exists  15^13.  The  ability  to  address  in  half- 
woros  and/or  bytes  is  important  as  it  may  increase  the 
efficiency  of  the  program  and  array  processor,  allowing 
operations  to  be  performea  which  may  not  have  otherwise  fit 
in  a  word-only  addressable  memory. 

Although  the  MAP-3U0  is  asychronous,  the  aavertisea 
average  CSPU  cycle  time  is  approximately  70-nanoseconJs  with 
about  500-nar.  oseconds  recuired  for  a  memory  reaa/ write 
operation  when  using  500-nanosecond  '''US  memory 
( 1 2b-nanoseconds  using  nipolar).  Full-word  operands  and 
results  starting  on  an  odd  address  oounaary,  however, 
reguire  about  two  500-nanosecond  memory  cycles.  A  pseuoo- 
operation  can  be  used  to  insure  even-boundary  locations 
exist  ri8] . 

The  MAP-iOO  is  capable  of  operating  in  temperatures 
from  0  to  bO  degrees  centigrade  at  10  to  90  percent 
humidity.  The  power  reguirements  are  eitner  115  VAC  or  «?30 
VAC   single   phase   plus   or   minus   ten  percent  at  ^7  to  5  3  .\^ 
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hert?.   The  weight  is  approximately  80  pounos. 

The  MAP  relies  heavily  on  internal  parallel  processing 
to  increase  throughput  and  limit  wait  time.  The  MAP-iOO 
stores  the  executive  and  array  routines  in  its  own  memory 
(as  opposed  to  storing  it  in  the  host  memory).   with  the  use 


68 


of  function  lists  and  statements  like  "MPlrtHL"  (MAP  version 
of  the  "DO  t'JHiLt"),  the  'MAP  can  ooerate  indeoenaently  of  the 
host  after  initial  loading  of  the  orogram  [I'^l.  iNith  the 
three  bus  structure^  the  MAP  theoretically  can 
simultaneously  input  into  one  memory^  output  from  the  secono 
while  doing  computations  on  the  third  and  never  utilize  the 
host  except  for  initialization. 

The  iMAP  has  a  separate  instruction  set  for  the  Central 
System  Processor  Unit  (CSPU)/  Arithmetic  Processor  Unit 
(APU),  Addresser  Processer  Section  (APS)^  and  Host  Interface 
Scroll  (HIS).  Inasmuch  as  these  processors  work 
i ndeoenaen t 1 y f  the  instruction  sets  are  not  as  complicatea 
as  mav  nave  been  necessary  if  operation  was  controlled 
totally  from  a  central  site.  The  total  number  of 
instructions  cer  second  attainable  by  the  ^'AP-iOO  is  data 
dependent.  /whenever  all  steps  necessary  to  perform  the 
operation  are  completed,  as  witnessed  oy  oroperly  setting 
the  correct  flags  in  Pseudo-memory  (to  be  discussed  later)/ 
the  operation  will  perform  to  completion,  v^Jhile  the 
aodition/multiolication  operaton  is  being  carrieo  out  in  the 
APU/  preparation  for  the  next  word  (half-word)  of 
information  can  be  conducted  in  the  unaffected  processors. 
System  flags  are  usea  to  communicate  between  the  processors. 
These  flags  include  General  Purpose  flags  available  to  the 
orogrammer  for  general  system  communication/  Control  flags 
to  control  processor  moaes  ana  operation  seguencing/  Status 
flags     to    indicate    processor    status    and    Hardware 
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Configuration  flags  [183. 

The  MiiP-300  system  installed  for  evaluation  consisted 
of:  the  MAP-300  processor/  interface  with  the  PDP-11 
computer  utilizing  the  PSX-llM  ooerating  system^  2'4K  words 
of  500-nanosecond  N*OS  master  memory  (8K  for  each  memory)^ 
power  oanelf  expansion  chassis/  installation/  I/O  driver/ 
SNAP-II  algorithm  library/  cross  assembler/  simulator  and 
loader.  The  price  of  the  system  was  S44/5U0  [21]  , 

A.   CHARACTfcRISTICS  AND  HARDWARE 
1.   CSPU 


The  Central  P'^ocessor  Unit  (CSPU)  Cfig  la)  is  the 
"Commano  Central"  of  the  '^AP-300  array  processor.  The  CSPU 
responds  to  commands  from  the  host/  transfers  aata  to  ana 
from  the  host/  assists  the  APS  in  address  calculations  and 
loads  tne  orogram  memories  of  the  Arithm. etic  Processor  and 
Host  Interface  ^^odule.  The  CSPU  performs  the  functions  of  a 
front-end  micro  computer  to  control  the  actions  of  the 
sy  s  t  em  . 

The  CSPU  has  a  fast/  fixed-ooint  aritnmetic  unit  for 
address  calculations/  an  instruction  register/  an  eight 
register  accumulator  file  and  a  priority  interrupt  network. 
It  has  access  to  the  three  main  memories  via  the  memory 
busses  and  supplies  the  other  MAP  processors  with  the 
program   instructions  they  need  from  main  memory.   Reentrant 
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CSPU   Block    Diagram 

Figure    14 


subroutines  and  multi-level  inairect  addressing  are 
recognized  by  the  CSHU.  It  has  no  I/O  capability  Cut 
instead  instructs  the  Host  Interface  Scroll  (or  I/O  Scroll) 
to  Derform  input  or  outcut  operations  to  or  f  ro^  the  host 
(or  external  devices).  The  CSPU  will  never  halt  but  will 
always  be  in  the  /iAIT  state  after  its  instruction  seguence 
i  s  comp 1 e t  ed . 

An  irroortant  register  in  the  CSPU  is  the  Control 
Status  Register  or  C-State  .Jord  (CS'^),  It  is  a  3<i-Dit 
reaister  containina  the  status  of  prior  operations/  the 
program  counter  as  well  as  the  source  and  oestination 
locations  for  olock  memory  transfers.  Fielas  of  the 
register  can  be  combinea  to  give  hardware  condition  codes 
for  use  in  conditional  operations/  branches/  jumps  or 
executes.  The  CS.j  also  stipulates  on  which  bus  instructions 
or  uata  aa  present  and  controls  the  interrupt  responses  for 
other  units. 

The  CSPU  is  Che  only  processor  able  to  be 
interrupted  in  the  WAP  (otner  processors  can  either  Halt  or 
Wait)  and  contains  a  b^  level  interrupt  priority  system  with 
one  interrupt  device  oer  level  and  three  lines  per  device 
(I'^cf  Possible  c  omo  i  na  t  i  on  s  )  .  The  CSPU  may  only  be 
interrupted  between  instructions.  It  will  also  nest  ana 
queue  lower  priority  interruots  if  a  higher  priority 
interrupt  is  preceived  curing  the  servicing  of  a  lower 
priority   interrupt.    These   interrupts   are   detected    by 
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oollinq  and  levels  are  polled  only  if   they   are   above  the 

current    interrupt   level.    Lower   level   interruots  will 

continue  to  exist  but   will   not   be   recognized   until  the 
higher  priority  interrupts  are     serviced. 

The  CSPU  contains  no  memory  but  uses  main  memory  to 
store  Its  instructions.  l^jhen  fetcheo/  these  instructions 
are  stored  in  the  instruction  register  until  execution.  The 
CSPU  may  also  address  a  pseudo-memory  location  called  System 
Flag  Register  (SYSFLG)  which  is  the  orimarv  inter-processor 
communication  system.  By  testing  the  bits  of  SYSFLG^  the 
CSPU  can  sense  the  status  of  any  of  the  other  processors. 
(Pseuao" Memory  refers  to  memory  physically  located  within 
the  Sub-processors  but  which  acoear  on  the  bus  as  a  memory 
address  similiar  to  the  POP- 1  1 / 3a/ a5/55/oO /70  .  )  [18j. 

2.   Arithmetic  Processor 

Tne  Arithmetic  Processor  consists  of  two  components^ 
the  Arithmetic  Processor  Unit  (APU)  and  the  Addresser 
Processor  Section  (APS). 

a.   APU 

The  Arithmetic  Processor  Unit  (APU)  (fig  15)  is 
responsible  for  the  computation  required  in  arrav  processing 
and  executes  programs  relatively  independent  of  the  other 
MAP  processors^  operating  under  the  aeneral  control  of  the 
CSPU  .  The  APU  consists  of  t-wo  adoers/  two  multipliers  (the 
main   distinction   between   the   iMAP-300   and  the  f'lAP-luO  or 
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MAP-200  is  that  the  former  contains  two  adders  ana 
multipliers  while  the  others  contain  only  one  eachj,  34 
various  registers  and  three  First-In-First-Out  (FIFO) 
buffers  for  inout  ana  outout  storage.  The  two  aaders  and 
two  multipliers  oermit  parallel  processing  of  data  to 
increase  throughput.  APU  programs  are  stored  in  main  MAP 
memory  ana  are  seguentially  b 1 oc k- t r ans f e r red  to  the  APU 
program  memory  under  control  of  the  CSPU, 

The  main  units  of  '"he  APU  are  the  arithmetic 
processors  (API  and  AP2).  Each  arithmetic  processor 
consists  of  an  adaer  and  multiplier  that  may  operate 
simultaneously  ana  independently  of  each  other.  Each  adaer 
is  fed  by  eight  registers  ana  each  multiplier  by  four 
multiplicand  registers  and  four  multiplier  registers.  The 
results  of  the  adder  are  routed  to  the  result  register  P  and 
the  multiplier  loads  the  product  register  P.  fo  transfer 
data  between  the  separate  arithmetic  processors/  an  exchange 
register  is  proviaed. 

APU  memory  consists  of  two  256- word  lo-oit 
sids-by-side  memories.  Tne  memory  is  initially  loaded  by 
the  CSPU  from  ^■AP  memory  and  the  APU  is  then  out  into  the 
run  state.  Instructions  are  sequentially  decoded  in  the  APU 
to  perform  the  specified  algorithm.  The  instructions  are 
lo-cits  for  each  board  (API  and  AP2)  and  are  executed  in 
oarallel.  They  can  perform  addition/  multiplication/ 
transfer    of    data    and   the   setting   of   flags.    Tnese 
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instructions  are  aecoaea  and  the  operation  started  as  soon 
as  all  necessary  conditions  are  met.  I mmea i a t e 1 y »  the  next 
instruction  is  retrieved  and  decoded  and  attempts  to  be 
executed.  If  either  the  P/R  register  is  involved  in  a 
multiplication/addition  operation  which  has  not  yet  been 
completed/  the  Input  Queue(IQ)  is  emoty  or  the  Output  Queue 
(OQ)  is  full,  the  APU  will  go  into  a  "wart"  state.  It  will 
remain  in  this  "wait"  state  until  the 
multiplication/addition  instruction  is  completed  or  the 
other  conditions  are  satisfied.  There  is  a  problem  that  can 
exist  oue  to  tne  sids-by-side  16-bit  memories  used  for 
program  storage.  Since  there  is  only  one  proaram  counter 
and  the  API  and  AP2  processors  work  in  parallel  the  sias- 
by-sioe  memory  acts  as  two  halves  of  a  3^-Dit  instruction 
register.  Therefore  if  one  board  (API  or  AP2)  is  forced  to 
wait/  the  other  must  also  wait  since  the  next  instruction 
may  not  be  retrieved  until  tne  proaram  counter  can  be 
incremented. 

The  Input  Queue  is  a  four-deep  FIFO  buffer  which 
services  both  API  and  Ap^.  To  get  the  next  input  data 
field/  the  10  must  be  advanced  before  tne  aata  is 
transferred.  If  both  boards  reguest  data  without  advancing 
the  gueue/  they  will  receive  the  same  data/  which  may  be 
gooa  for  certain  applications.  If  they  both  simultaneously 
try  to  advance  the  10/  it  will  advance  only  once  and  give  an 
API  priority/  then  advance  the  second  time  after  the 
transfer  has  been  completec  to  give  data  to  AP2. 
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There  are  two  Output  Queues  each  of  which  is  a 
four-deep  FIFO  buffer.  These  queues  allow  maximum  capacity 
of  the  adder  and  multiplier  to  be  utilized,  since  it  is  less 
likely  that  the  processor  will  have  to  wait  for  either 
buffer  to  have  a  vacancy  due  to  a  busy  bus  system.  If  both 
processors  try  to  act  on  any  sinale  0Q»  orocessor  API  will 
be  given  the  priority. 

A  tyoical  multiplication  takes  appro5<imately  six 
cycles  (i^^O-nanoseconds)  and  a  typical  adu  ta<es  about  three 
cycles  (210-nanoseconds).  Therefore/  to  increase 
throuqhout/  "hiding"  addS/  mcves/  etc.  behind  multiplies 
will  accomolish  operations  in  tne  time  it  takes  to  do  the 
multiply  alone.  The  most  efficient  method  to  program  the 
iMAP-300  is  to  treat  successive  samole  sets  in  alternate 
processors;  this  effectively  produces  a  multiply  every 
210-nanoseconcjs.  Since  there  is  one  inout  aueuer  this 
method  allows  both  to  nave  access  to  the  same  information 
(by  not  incrementing  the  queue)  and  also  gives  a  greater 
chance  to  use  hiding  effectively. 

The  APU  can  usually  operate  in  two  modes.  Mode 
One/  the  normalized  moae/  can  either  use  normalized  or 
unnormalized  floating-point  numbers  as  input  with  the 
results  being  a  normalized  floatinq-ooint  numiber.  Using 
unnormalized  f 1 oa t i ng-co i n t  numbers  as  inout  can  lead  to 
precision  loss  since  the  normalization  process  will  shift 
the  mantissa  to  the  left  (values  less  than   .1)   or   to   the 
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right  (values  greater  than  1.0).  The  vacancies  created  by 
these  shifts  /^ill  be  filled  with  zeros/  which,  after 
comPutationr  could  possibly  produce  an  unusual  truncation. 
The  unnormalized  mode  will  accept  unnormalized  numbers  as 
input  and  will  return  unnormalized  numbers  as  output  [16], 

b.   APS 

The  Addresser  Processor  Section  (APS)  (fig  lb) 
computes  both  the  address  in  '"'AP  memory  for  the  location  of 
input  data  words  to  be  processed  oy  the  APU  and  tne  MAP 
memory  addresses  for  the  output  from  the  APU.  It  operates 
indeoenoently  of  other  processors^  within  status  ana  control 
flag  constraints  of  SYSFLG.  The  APS  contains  a  128-wcra 
25-bit  memory/  four  program  counters  (two  for  read  and  two 
for  write)/  eight  address  buffers  (to  be  used  as  inputs  to 
the  adder)/  four  First-In-First-Uut  (FIFO)  buffers/  an 
arithmetic  logic  unit  (adder)/  ana  associated  logic  and 
control  units. 


The  APS  programs  are  stored  in  MAP  main  memory 
and  are  loaded  by  the  CSPU.  Certain  absolute  adaress 
locations  must  oe  known  to  a  APS  proaram  at  run  time  which 
are  not  available  during  proaram  writing,  [he  assembler 
computes  thenn  at  assembly  time  and  the  CSPU  inserts  them 
into  the  proper  location  curing  this  program  transfer.  The 
CSPU  then  initiates  APS  operation  by  setting  the  proper 
flags.  The  APS  may  be  loaded  with  new  information  by  the 
CSPU  during  run  time  by  cycle  stealing,  thereby  not   causing 
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tne  APU  to  slow  and  wait  for  a  value  in  the  IQ  or  a  space  in 
the  OQ.  Because  the  instructions  in  WAP  .-nemory  are  32-bits 
long  ana  the  APS  instruction  is  only  25-bits  long^  the  seven 
bits  left  over  are  used  to  store  the  APS  memory  address  for 
that  instruction.  This  allows  the  CSPU  to  increase 
throughput  by  immediately  installina  the  instruction  into 
the  correct  location  in  a  pre-computea  order. 

The  adder  computes  addresses  dependent  on  prior 
computational  results/  literals  or  specified  increments. 
All  address  addition  ana  subtraction  is  considered  to  be 
modulo  ^  **  17  so  tnat  only  oositive  addresses  in  that  ranae 
will  be  computed.  Results  are  queued  in  either  the  Read 
Address  FIFO  (RAF)  or  /J  rite  Aadress  FIFO  (aAF).  Along  with 
the  address  is  a  code  to  delineate  whether  the  address  is 
full-word/  half-word  or  oyte  (pair  of  bytes  in  a  lo-bit  half 
wora  adaress)  and  if  it  is  a  eioht-bit  fixed-point  number/ 
lb-bit  fixed-point  number,  16  bit  floating-point  number  or  a 
32-bit  floating-Doint  number. 

The  distinctive  feature  of  the  APS  is  that  there 
are  four  program  counters  (PO,  Pi,  P ^  and  P3).  These  allow 
four  separate  orograms  to  be  stored  in  the  APS  and  executed 
in  an  interleaved  manner.  Seiuencing  of  these  programs  is 
controlled  bv  the  status  of  the  'aAF  and  RAF  in  conjunction 
with  the  APS  instructions.  These  program  counters  also 
provide  a  loopina  ability  allowing  the  APS  to  work  with  the 
Host   Interface   Scroll  or  I/O  Scrolls  to  keeo  data  flowina. 


80 


After  one  -nernory  has  been  processed  and  reloaded/  the  APS 
need  not  be  reinitiated  out  can  continue  operation  on  tHe 
new  data  ov  this  looping  feature  [18]. 

3.   Host  Interface  Scroll 


The  Host  Interface  Scroll  (HIS)  consists  of  two 
subsections^  the  Host  Interface  -''^oaule  (HIM)  (fig  17)  which 
is  located  in  the  MAP-500  and  the  Host  Interface  Controller 
(HIC)  which  is  located  in  the  host  memory.  The  host 
Interface  'Module  transfers  "^AP  programs/  unprocessed  data/ 
host  status  and  Host  Interface  Controller  commanas  from  the 
host  to  the  N'AP.  Processeo  data/  "^^AP  status  and  processing 
commands  are  also  transferred  from  the  MAP  to  the  host  via 
the  HIM,  A  programmaDle  scroll  processor  is  provided  for 
computing  MAP  and  host  memory  locations  durina  a  Direct 
Memory  Access  (DMA)  operation.  Other  pertinent  devices 
include  a  memory-bus  interface/  controllers  for  host  memory/ 
format  conversion  hardware/  status  and  control  logic  along 
with  interrupt  logic. 

The  HiC  controls  the  handshaking  necessary  between 
the  host  and  the  ■•^AP.  The  handshaking  consists  of  interrupt 
logic  from  MAP  to  host  and  logic  necessary  for  controlling 
the  transfer  of  data  with  either  Direct  Input/Output  (DIO) 
facility  or  DMA  transfer  [18]. 

The  host  generally  interrupts  the  MAP  to  initiate 
program   seguencng.    However/  when  the  MAP  is  completed/  it 
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will  initiate  communication  (interrupt)  with  the  host  for 
further  work,  t\hen  the  interruDt  is  acknowledgea  by  the 
hostf  more  data  or  programs  are  sent  to  the  ^^AP  depending  on 
the  flags.  (if  all  ^^AP  processors  are  in  a  loop  operating 
on  data  supplied  from  external  devices  and  delivered  to 
external  devices  via  I/O  Scrolls^  the  host  will  not  be 
interrupted  unless  there  is  an  error.  Ihis  frees  the  host 
to  do  any  other  unrelated  processing  necessary.)  The 
maximum  response  time  to  initiate  an  interrupt  is  15  0 
microseconds  for  the  H I  f^  and  250  microseconds  for  a  user 
CALL  rout  i  ne  135]  . 

4,   i^lemory 


Main  memory  in  the  '^'AP-300  consists  of  three 
independent  Dusses  each  havina  trie  capability  of  256h  words 
of  500-nanosecond  i^OS  memory  or  o^K  words  of  bipolar  memory. 
Memory  types  may  not  oe  intermixed  on  any  given  bus  but  each 
bus  may  have  a  aifferent  type  from  another  bus.  '"^ emery  can 
also  be  either  master  or  slaves  master  memory  oeing  used  to 
control  program  execution/  aroitrate  and  observe  system 
protocol  while  slave  memory  stores  the  data.  tach  memory 
bus  containing  memory  is  reguired  to  have  at  least  one 
master  memory  module  (available  in  either  UK  or  8K  blocks 
for  MOS  or  IK,  <fK,  or  aK  clocks  for  oipolar). 


Access  to  each  memory  is  via  a  common  bus  having   11 
ports   an(j  two  priority  levels.   Three  ports  are     reserved  to 
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be  used  with  the  at^solute  Driority  scheme  leaving  eight 
ports  with  a  sequential  round-robin  (polite)  priority 
scheme.  Absolute  priority  is  the  highest  priority  ana  is 
intended  to  be  used  with  high  SPeed  minimally-buffered 
devices  such  as  disc  units  or  tape  units  where  loss  of  data 
may  result.  Sequential  round-robin  priority  handling  is 
used  for  slower  buffered  devices  and  is  a  round-robin 
(circular)  aueue  which  is  checked  each  memory  cycle.  The 
device  first  in  the  Queue  will  get  the  next  memory  cycle. 
Scanning  for  the  next  queued  device  will  commence 
immediately  upon  the  previous  device  starting  tranfer.  i'^jhen 
the  next  memory  cycle  occurs  the  new  device  will  be  known 
keeping  overhead  minimal.  Of  these  11  ports^  the  HIS  anq 
CSPU  each  have  one  dedicated  port  and  the  AP  has  two 
dedicated  oorts  on  each  bus  with  seven  ports  remaining  for 
the  IDS  and  other  uses, 

Psuedo-memory  (alluded  to  earlier)  is  the  upoer  uK 
words  on  Bus  1  containing  addresses  of  certain  registers 
used  for  status  and  control.  These  registers  are  located  in 
the  suc-processors  but  apoear  as  addresses  on  the  memory 
bus.  Any  sub-processor  may  alter  the  contents  of  these 
locations  so  it  is  important  that  the  proarammer  not  try  to 
overwrite  these  addresses  with  programs  or  data  [18J, 


B.   SQFT.NARt  SUPPORT 
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As  with  the  AP-120B,  there  are  software  routines  to   aid 
in  program  development  ana  execution. 

1.   Executive  and  Associated  Routines 


a  .   A  ssemb 1 e  r 

The  MAP-3U0  assembler/  written  in  ANSI  Fortran 
IV/  tal«es  a  source  program  written  for  either  the  CSPU,  APIJ, 
APSf  Mis  or  lOS  and  creates  an  executable  ooject  module.  A 
listing  file  and  errors  file  can  also  oe  created.  Editing 
and  updating  can  be  accomplished  from  the  last  source  file 
by  chanaing  and  assembling  only  the  incorrect  line  (.or 
lines)  of  coder  tnereby  avoiding  the  reassembling  of  the 
entire  program  [18],  The  assembler  will  also  allow  change 
of  the  dlM  memory  to  enable  it  to  handle  necessary 
buffering. 

b.   Simulator 

The  "^'AF  Simulator  Program   simulates  model   200 

and   model  300  orocessors  by  executing  ''lAP  object  code.   The 

simulator   oermits   the   programmer   to   develoo  or    debug 

software  off-line  so  as  not  to  disturb  production  schedules. 

The  '^AP  Simulator  Program  has  tne  caoability  of 
simulating  the  ooeration  of  the  APUr  APS*  CbPU,  '^emory  ana 
the  interrupt  handler.  It  has  not  been  updated  to  handle 
certian  new  commands  and  flaas  (listed  in  the  front  of 
refl251)  nor  does  it  have  the  ability  to   simulate   the   APu 
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test  mode.  Memory  size  anc  tyoe  can  be  specified  either  in 
the  initial  loading  of  the  simulator  or  while  running  to 
tailor  it  for  current  or  orooosed  configurations. 

When  used  as  a  deougginq  aid^  tne  i^AP  Simulator 
Program  allows  the  operator  to:  install  breakpoints  and 
execute  macro  instructions  at  these  breakpoints;  detect 
program  errors  and  execute  macro  instructions  after  their 
discovery;  examine  reaister  contents;  run  programs  from 
different  processors  (APUf  CSPU^  etc.)  independently;  and/ 
patch  loaded  proorams.  Input /outout  may  be  obtained  from  a 
term,  inal/  orinter,  tapeCmagnetic  or  paper),  cards  or 
cassette.  A  batch  moae  is  also  available.  Actual  program 
timing  can  oe  estimated  by  installing  breakpoints  and 
individually  timing  small  sections  of  code  [25]. 

c  .   Loade  r 

Tne  MAP  Loaaer  is  a  Fortran  orogram  which 
acceots  object  code  proauced  by  the  Assemoler  ana  create 
blocks  of  binary  code  in  MAP  machine  lanauage.  This  coae  is 
transmittea  to  the  i'-lAP  memory  via  the  MAP  driver  through  the 
Host  Interface  Scroll.  Errors  in  transmission  are 
detectable  since  check-sum  digits  are  transmitted  to  the  MAP 
along  witn  the  blocks  of  cede.  The  Merge  operation  creates 
and  updates  the  tables  and  adaresses  necessary  if  the  loaaed 
module  IS  to  be  usea  with  the  SNAP-II  executive  [^2]  . 
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d.   Debua  Package 

The  MAP-300  diagnostic  package  is  designed  to 
verify  hardware  operations  and  isolate  any  malfunction,  to  a 
specific  card.  One  module  is  resident  in  the  host  while 
another/  which  contains  the  test  modules  and  test  programs 
necessary  to  determine  proper  system  operation  of  the  CSPU 
and  other  sub-proc essors /  is  present  in  the  MAP.  This 
software  can  run  interactively  or  under  batch  processing 
[18J  . 

The  MAP-'500  LUCK  proaram  permits  the  programmer 
to  examine  i>''AP  memory  for  oseudo-memory)  from  any  computer 
capable  of  operating  under  A MSI  Fortran  IV.  This  is  also  an 
interactive  routine  and  provides  the  ability  to  "patch" 
coded  program  seaments  or  enter  entire  machine  languaae 
programs.  The  proarams  or  segments  can  then  be  stepped 
througn  to  examine  the  results  closely  [20], 

2.       SNAP-II 


Systematic  Notation  for  Array  Processing  Version  II 
or  SNAP-II  is  a  single-command  high-level  macro-tyoe 
language  used  to  program  the  NlAP-300  ar  r  a^  processor.  The 
SNAP-II  package  consists  of  a  Host  Support  ^-'odule/  Host/N^AP 
driver  module^  3NAP-II  Executive,  SNAP-II  Function  Modules 
ana  an  installation  test  and  Acceptance  test  Module  [18j. 

Tne  Si\AP-iI  executive  permits  the  user  to  define 
buffer   size,   and  the  structure  and  location  of  programs  in 
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MAP  rnemory.  The  executive  also  structures  the  routines  to 
operate  at  maximum  speed  bv  insurina  that  the  maximum 
possible  parallelism  exists  between  sub-orocessors  (for  CSPI 
written  functions)^  thereby  accentuating  "hiding".  The 
SNAP-il  subroutines  are  written  in  ANSI  Fortran  and  passed 
to  the  MAP  via  Function  Control  Blocks  (FCB),  The  MAP 
Driver^  which  is  located  in  the  host/  directs  the  loading 
and  operation  of  the  orograms.  (In  a  looo  or  "Map  While" 
condition  the  driver  need  only  load  and  initiate  the 
seguence  then  return  control  to  the  host  operating  system.) 

S^iAP-II  allows  the  oroarammer  to  buil(J  his  own 
function  lists  with  the  Fortran  tvoe  statement  "'"'ap  Begin 
Function  List"  ('MPBFL())  whicf^  oermits  the  host  to  remain  as 
free  as  doss i Die  from  the  ooeration  of  the  MAP.  Two- 
dimensional  arrays  are  demu 1 t i o 1 e xed  by  SNAP-II  thereby 
increasing  speed  of  execution  in  the  processor  oy  not  having 
to  compute  two-dimensional  adaress  structures.  SNAP-II 
functions  are  callable  from  either  ANSI  Fortran  or  Host 
assembly  language  orograms  ana  are  able  to  operate  on  both 
real  ana  complex  data  [15J  . 

5,   Programming  Language 

If  SNAP-II  functions  are  not  specific  enough  to 
satisfy  the  programmer's  needs  or  if  they  do  not  exist  in 
the  SNAP-II  library/  new  routines  may  be  written  in  an 
assembler  type  language.  The  CSPU,  APU/  APS  and  HIS  each 
have  their  own  instructions  to  ootimize  each  sub-processor's 
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caoab  i 1  i  t  i  es  , 

Tne  CSPU  instructions  are  broken  into  10  groups 
which  have  the  ability  to  oerform  all  the  functions  that  a 
general  purcose  computer  is  nornnally  visualized  as 
performing.  They  include:  generic  (performs  interrupt 
system  cooing  and  looping);  single  register/'  move;  logical; 
push  and  pop;  hop  and  jump  (a  hop  is  within  ^5b  half-word 
locations  ana  a  jump  can  be  to  any  new  location);  skip  ano 
bit  manioulation;  comoare;  and  maintenance  and  test  console 
instructions.  The  APlj  can  perform:  two-argument  adder; 
single  argument  adder  (lik:e  aooroximate  reciprocal 
instructions);  multiply;  data  transfer;  jump  and  call;  ana 
control  operation  instructions.  The  APS  performs:  load; 
address  increment;  register  arithmetic  and  control  type 
instructions.  The  hIS  recognizes:  single  register;  logical 
register;  arithmetic  register;  literal  ana  control 
instruction  types  [18], 

Since  each  sub-processor  is  desianed  to  perform  a 
special  ooeration  and  can  be  programmed  to  optimize  that 
design/  the  overall  oerformance  of  the  system  is  increased. 
All  Processors  perform  in  parallel  and  stay  in  "sync"  by  the 
use  of  flags.   A  sub-processor  will  wait   until   the   proper 

ij  flag  is  set  before  continuing/  thereoy  insuring  intearity. 
The  waiting  also  relieves  the  programmer  of  "counting 
cycles"   with   No   Operation   (NOP)  instructions  which  could 

ll  possibly  cause  lost  data.   The  drawback  is  that  he  does  nave 
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an  increased  comolexitv  by  insuring  that  proper  flags  are 
set  at  the  prcoer  time  [lo].  '^ost  of  these  encumbrances  are 
elirninated  by  the  executive  however.  Flags  are  available  in 
Dseudo  memory  and  are  easily  tested.  The  complexity  issue 
is  minimal  since  for  most  applications  only  APU  and  APS 
routines  neeo  be  written.  Only  under  soecial  circumstances 
is  a  CSPU  or  HIS  routine  reauired. 

Pseudo-ooe ra t i ons  are  also  available  to  ease  the 
programming  Duroen,  They  perform  such  tasks  as  naming 
character  strings-^  insuring  that  information  is  olaced  into 
memory  on  a  wore  oounaary/  generating  constants  and  making  a 
test  Control  Status  Aord  (CSw). 

a.   I/O  Scrol 1 s 

The  I/O  Scrolls  (lOS)  control  Ir  1  oc  <- 1  rans  f  e  r  s  to  or 
from  external  oerioheral  devices  (incluoing  other  MAP'sJ 
without  interferring  »yith  tne  '^'AP-3uO  processing  cycle  by 
using  a  sub -d r oc es so r  which  can  oe  d re~p rog rammed  .  The  lOS 
contains  three  functional  elemf^nts:  orotocol  logic  necessary 
to  interface  the  external  device  directly  to  the  MAH-iOU 
memory  busses/  a  programmable  orocessor  to  compute  MAP 
addresses  and  issue  control  signals?  and/  tne  transfer  logic 
necessary  to  interface  with  oeripheral  devices. 

There  are  five  basic  lOS  models.  lOSW  also  icnown 
as  the  maintenance  ana  test  console^  is  caoable  of 
transferring  eight-oit  single  words  to  MAP  ous  number  one  at 
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a  S  KHZ  rate.  I0S2  has  two  transfer  rate  options  and  two 
word  size  options  available.  \.'ior6  size  option  one  utilizes 
the  b 1 oc k -t rans f e r  of  8  or  lb-bit  words  to  any  of  the  three 
HAP  busses  while  option  two  uses  either  16  or  3^-bit  words. 
Transfer  rate  option  one  conveys  information  at  a  1  ^HZ  rate 
as  compared  to  the  2.5  MHZ  rate  of  option  two.  Either 
transfer  rate  option  may  be  comoined  with  either  word  size 
option;  however/  only  one  combination  is  available  at  a  time 
since  they  are  hard-wired.  Under  program  controls  I0S3  can 
transfer  either  lo  or  i^^-oit  woras  to  any  of  the  three 
busses  at  a  750  Kn7  sustained  rate.  10S3  can  also  perform 
format  conversion/  monitor  data  with  a  basic  ooeration 
si  miliar  to  ti^e  HIM  and  sucport  indirect  adaressinq.  lOS^  is 
a  high  speed  (up  to  ^0  N'HZ)  scroll/  allowing  block,  transfers 
only, of  ft/  16/  32  or  6^-bit  words  to  any  Dus  (oU-bit  woras 
must  be  transferred  simultaneously  to  bus  <?  and  bus  3). 
lOS^  also  allows  oacking  and  buffering  of  data  [18].  IDS 5 
is  a  airect  memo r y- t o-memo ry  bus-connect  option  for  airect 
data  transfer  between  user  aevices  and  the  MAP-300.  The 
module  requires  no  software  (and  will  not  supcort  softwai'e). 
Its  operation  is  control  lee  by  hardware  ana  three  interrupt 
reques  t  lines  [2  1 J  . 

a.   Analog  Data  Acquisition  Module 

The  Analog  Data  Acquisition  Moaule  moael  5120 
(ADAM-512U)  i<;  a  programmable  analog  interface  capable  of 
accepting  from  2  to  16  channels  of  analog  information.   This 
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information  is  then  digitized  to  12-bit  resolution  at  a  2.1 0 
KHZ  throughput  rate  for  the  16-channel  case  (125  KriZ  for 
single  channel).  As  with  the  I/O  Scrolls^  the  A/D  operation 
may  taKe  place  simLjltanecusly  with  the  MAp-300  processing. 
The  AQAM  is  functionally  ecuivalant  to  the  lOSd  with  only 
added  analog-to-digital  circuitry.  This  allows  the  ADAM  to 
be  SMAP-II  compatible. 

The  oDeration  of  the  ADAM  is  carried  out  via  a 
set  o^  up  to  Id  samole-anc-hold  units  which  then  make  their 
sianals  available  to  a  lb:l  multiplexer.  Each  channel  of 
the  multiplexer  is  the  consecutively  sampled  by  the  A/D 
converter  which  outputs  either  a  Ib-oit  sign-magnituoe  or 
lo-Dit  f  1  oa t i na-oo i n t  number.  Performance  accuracv  is 
specified  a  0.2  percent  of  full-scale  resolution  [2]. 

C.   PKOGRaMMING,  OPEPATI0^i  AND  EXECUTION 


The  N'AP-^OO  can  not  only  utilize  parallel  operations  of 
the  adaer  and  multiplier  in  the  APU/  but  also  the  parallel 
sub-processor  ©Deration  of  the  APS,  HIS/  lOS/  APU  ana  CSPU 
to  increase  total  throughput.  The  programmer/  dv  breaking 
the  Problem  into  smaller  independent  proarams  of  adc^ressing/ 
arithmetic/  I/O  and  management/  can  theoretically  more 
easily  proaram  the  entire  proolem  than  by  adherring  to 
internal  communication  protocol  and  flags  [18).  The 
respective  programs  should  be  easier  to  write  with  much  of 
the   increase   in   overheaa  due  to  the  added  handshaking  and 
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protocol  requirements  beinq  assimulated   by   the   executive. 
[16J  . 

CSPI  recommends  that  a  modified  top-down  programming 
technique  be  used  initially  by  writing  the  APU  routine  first 
to  insure  the  optimum  execution  speed.  Then  adding  the  other 
necessary  routines  (generally  just  the  APS  routines)  to 
insure  the  information  is  cresent  when  the  APU  needs  it. 
The  APU  shoula  be  orogrammed  to  treat  subsequent  sample  sets 
in  alternate  adder/multiolier  modules  and  arrange  data  so 
that  as  many  adds  can  be  "hidden"  as  oossible  118] .  By 
proper  execution,  sequencing  total  time  can  be  shortened  to 
equal  the  time  to  multiply  only#  with  all  other  operations 
"hidden"  under  these  multiclies.  This  "hiding"  operation 
becomes  easier  in  the  f>'AP-300  than  in  the  AP-120B  since 
cycles  need  not  oe  counted  and  ^iOP's  need  not  be  inserted 
for  unused  cycles  due  to  flags  being  set  to  signal  the 
availability  of  resources  lib].  The  Drogrammer  must  be 
a^are  that  the  timing  is  not  absolute/  therefore  the 
executive  will  tightly  control  synchronization  oy  flags  to 
insure  one  adder/multiplier  does  not  get  anead  of  the  other. 

The  programs  are  initially  loaded  from  the  host  to  the 
MAP  via  the  ooerating  system  interface  and  driver.  The 
'^APOVR.MAC  routine  ma<es  the  standard  interface  through  the 
operating  systm  anrj  MPQRV.MAC  maxes  the  "^A?  appear  as  a 
standaro  PSX-llM  device  to  the  computer.  Initial 
communication   from   the   host  to  the  MAP  is  done  via  a  four 
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word  Driver  Control  Block   (DCB)   [2b],  when   the  Central 

System   Processing   Unit  is  initialized  by  the  host/  it  will 

load  the  other  sub-processor  programs  and   commence  program 
execut  ion. 


Subsequent  MAP  commanas  are  sent  to  the  ^AP  from  the 
HOST  via  Function  Control  Blocks  (FCB)  which  reouire  host 
intervention  to  send.  (Function  lists  and  the  MPWHL  macro 
treat  multiple  f-CB's  as  a  single  entity).  These  FCb's 
transmit  host  to  ^'AP  status^  interrupts  and  functions  to 
perform  and  can  be  cueued  in  the  HIS  buffer.  ^\hen  it  is  no 
longer  necessary  for  the  host  to  send  or  receive  a  FCB/  it 
can  perform  other  operations  [35],  Therefore/  with 
efficient  use  of  the  103  and  the  possibility  of  stringing 
MAPs  in  serieSf  the  host  can  be  free  to  either  oerform  other 
tasi^s  or  act  as  a  system  monitor. 
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VI.   DISCUSSION  OF  FINDINGS 


In  the  test  oeOf  the  PDP-11/3^  was  chosen  to  perform 
the  front-end  functions  which  consisted  of  buffering  the 
data^  formatting  it  and  then  passing  it  to  tne  array 
processor  or  mass  storage  device  (or  from  the  mass  storage 
device  to  the  array  processor).  This  limited  front-end 
inputting  function  did  not  dictate  tnat  the  computer  be 
large.  Tne  choice  of  the  PDP-11/3^  cofnputer  for  this 
apoli cation  seems  adeguate.  The  PDP-11/0^  would  normally 
contain  enough  speea  to  handle  the  necessary  ooerations  but 
may  be  unsatisfactory  since  it  does  not  have  a  resident 
memory  control  ana  protection  routine  to  ease  the 
programmers  burden  ana  help  insure  system,  integrity/  nor 
does  it  contain  the  2k  cache  memory  to  increase  sceed.  A 
computer  larger  than  the  PDP-11/3U  may  not  increase  the 
effici^ency  of  the  system  although  it  would  increase  the 
cost. 

The  test  bed  utilized  the  PDP-11/70  for  the  outcut 
computer.  The  output  comouter  would  he  reauired  to  receive 
information  from  the  array  processor,  manipulate  the  data 
and  store  it  for  future  display  on  one  or  more  devices.  For 
this  application,  the  PDF-ll/70  seems  best  for  several 
reasons.  The  system  is  much  like  the  11/34  except  that  the 
current  maximum  memory  is  2  megabytes  to  allow  for  better 
utilization   of   information.    There  are    dedicated  paths  to 
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hiah  performance  storage  devices  that  would  allow  more 
infofmation  to  be  processed  per  unit  of  time.  To  further 
process  arrays  for  outputs  there  is  a  3?-bit  or  a  6M-bit 
floating-point  arithmetic  unit  available.  The  PDP-11/70 
gives  large-computer  performance  and  expansion  capabilities 
with  the  cost  and  space  reauirements  of  smaller  units  151]. 
Using  tne  same  manufacturer  for  the  output  function  as  was 
used  for  the  input  function  reduces  interface  problems  and 
contributes  to  the  proficiency  of  the  orogrammers  by 
increasing  overall  knowledge  of  the  architecture. 

The  proposed  test  oea  uses  of  the  11/34  and  11/7  0  can 
be  greatly  modified  by  the  choice  of  the  ar  r  av  processor. 
The  MAP-300  utili^^ing  an  Analog  Data  Acauistion  Module 
and/or  I/O  Scroll  can  eliminate  the  neea  for  the  input 
functions  (including  lb  channel  analog-to-aigital 
conversion)  therefore  permitting  the  11/70  (or  possibly  a 
less  costly  model)  to  perform  input/  output  and  monitor 
functions  in  the  test  bed.  In  fact/  the  11/70  will  probably 
be  large  enough  and  fast  enough  to  facilitate  combining  all 
subsystems/  except  the  display  subsystem/  under  one 
computer.  The  11/34  and  11/70  combination  should  provide 
for  the  full  range  of  computers  necessary  to  properly 
emulate  and  evaluate  just  how  much  comouter  capability  will 
actually  be  needed  for  any  soecific  apolication. 

The  guestion  arises  as  to  which  is  the  best  array 
processor   for  the  application.   The  AP-120b  is  synchronous/ 
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therefore  some  may  say  safer,  has  a  38-bit  word  which  could 
mean  greater  accuracy,  more  standard  library  functions  (such 
as  vector  log  base  10  and  vector  log  base  e)  and  a  3500  hour 
mean  time  before  failure.  The  MAP-300  is  a  newer  system 
which,  due  to  the  minimal  host  involvement,  three  separate 
busses,  I/O  Scrolls  and  the  ADAN',  can  proviae  greater  long 
run  throughput  ana  more  flexibility. 

For  the  non  real-time  environment  where  simple 
programming  ana  host  involvement  can  be  tolerated,  the 
AP-120B  may  be  a  good  ci^oice.  It  can  provide  facilities  to 
tailor  algorithms  to  specific  needs;  these  facilities  are 
not  yet  too  complex  to  tax  the  normal  programmer.  however, 
new  programs  cannot  be  adaea  directly  to  the  AP  math  library 
(AP.MATH)  out  must  be  linkea  and  loaded  for  every  usage  as 
would  any  application  program.  This  creates  an  excessive 
tim.e  overhead.  Therefore,  the  AP-li^Ob  should  be  used  only 
where  simplicity  and  ease  of  use  are  paramount  and  utility 
can  be  sacrificed. 

For  applications  recuiring  real-time  computations 
(which  the  test  bed  most  likely  »«ill  eventually  demand) 
innovative  desian,  high  throughput  rates  and  generally 
greater  flexibility,  the  MAP-300  provides  the  answer.  The 
improved  performance  o*  both  array-processing  potential  and 
computer  availability  is  offset  by  the  increased  cost  of 
program  development  if  non-library  routines  must  be  written. 
These    routines    however   may   ce   added   to   the   Horary 
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effectively  reducing  overhead.  Reference  [231  reports  that 
the  NiAP-300  also  complies  with  MIL-E-lbaOO,  MIL-L-5a00, 
MIL-STD-abl A,  MIL-STD-70UB  and  M I L-STD- 1 599  . 

During  the  installation  of  the  MAP-500  at  the  )>iava1 
Postgraduate  School/  it  was  noted  that  the  installation 
documentation  was  extremely  poor.  As  of  this  writing,  three 
weetcs  were  required  to  install  the  system.  This  was  due 
mainly  to  the  ooor  documentation  in  the  installation  package 
receivea  with  the  unit.  Not  only  was  the  package 
incompletef  but  chanaes  to  the  software  were  performea  that 
were  not  changed  in  the  original  documentation/  nor  was  an 
eratta  sheet  provided. 

It  is  realized  that  for  many  companies  involvea  in  data 
processing  equipment  manufacture/  documentation  is  not  a 
chief  concern.  However/  CSPI  seems  to  nave  far  inferior 
installation  documentation  than  would  reasonaoly  be 
expected.  This  situation  made  it  impossible  to  ao  a  aood 
test  of  the  system  operation  but  allowed  only  a  cursory 
review. 

E.ven  with  the  evident  shortcomings  of  the  documents/ 
theoretically  the  N'AP-300  is  far  superior  to  the  AP-12UB. 
If  CSPI  would  upgrade  their  documentation  and  perform  the 
installation  at  the  site/  their  sometimes  negative  public 
image  could  be  eliminated  and  confidence  in  tneir  eouipment 
could  be  increased.  It  must  be  noted  ho«ever  that  ref  [I'^J 
and  the  publication  "Simple  Notation  For   Array   Processing/ 
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Version  11/  Reference  Manual"/  are  excellently  written. 
There/fore  in  the  following  discussion/  the  use  of  the 
MAP-300  will  be  assumed.  I  will  now  look  at  each  suDsystem 
closely  and  attemot  to  aetermine  alternate  designs. 

The  analog  subsystem  oDtains  data  from  one  of  four 
sources:  time  code  read/generator/  14-track  recoraer 
(Honeywell  9b),  signal  synthesizer  (Rockland  5100)  ana/or  a 
noise  generator  (HP  3722A).  Up  to  128  channels  of  input  are 
amplified/  sent  through  a  programmable  matrix  switch 
resulting  in  52-channel  output  signals  to  a  oroarammaole 
32-channel  filter.  Tnese  analog  signals  then  leave  the 
analog  subsystem  to  be  input  to  the  signal  processing 
subsy s t  em  . 

The  ArJ-S'^OO  analog-to-digital  converter  performs  a 
Id^-Dit  A/0  conversion  and  is  then  loaded  the  Amoex  Megastore 
mass  storaqe  device  through  the  PDP-11/54  computer.  The 
output  of  the  array  processor  will  then  oe  sent  to  the  data 
processing  subsystem. 

I  suggest  it  may  be  easier/  more  flexible  and  cheaper 
to  inout  the  3£  channels  as  before  to  the  orogrammaole 
filter/  but  tnen  the  32  channels  may  be  better  hanalea  by 
two  Analog  Data  Acquisition  Modules  directly  into  the  i^'AP 
for  processing  or  via  an  I/O  Scroll/  moael  3/  be  sent  to  the 
PDP-11/70  storage  devices  for  future  use.  This  will 
eliminate  the  expense  of  the  A/0  converter/  Ampex  Megastore 
and   the  PDP-11/34  but  more  important/  it  will  be  relatively 
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easy  to  perfornn  calculations  in  real-time.  Once  the  MAP-300 
is  started;  it  can  perform  without  host  intervention  until 
interrupted  and  witn  an  assumed  input  of  ^0  KHZ*  the  system 
should  not  De  taxed.  The  output  of  the  MAP  can  tnen  be  sent 
directly  to  the  data-processnq  subsystem.  The  entire  system 
can  also  be  less  complex,  affording  easier  system 
de ve 1 oomen  t  . 

Assume  that  a  fictional  system  with  a  ^0  KHZ  input 
requires  a  FFT  and  discrete  digital  filter  to  be  aone  on  the 
information.  The  timing  of  a  102^  real  to  516  complex 
Fourier  transform  reauires  3.0  milliseconds  I2i]  and  a  'JU 
KHZ  input  rate  would  require  59 ,  I  FFT's  per  second  on  the 
average.  This  would  consume  117.3  milliseconds  ano  assuming 
a  50  percent  overhead  yielc  175.^5  milliseconas  to  perform 
the  Fourier  transform.  Discrete  filtering  would  require 
another  39.1  *  (  lOcfa  *  (  2  *  500  nanoseconds  ■♦•  1 2  *  7  0 
nanoseconds))  or  73.67  milliseconds.  Again  assuming  50 
Percent  overhead/  110.51  milliseconds  would  be  necessary  for 
tne  filtering.  The  total  time  consumed  by  the  two  functions 
would  be  586.5  milliseconas^  leaving  713.5  milliseconds  for 
other  wor<.  (Fifty  percent  overhead  is  an  over-estimation.) 
Loadina  data  into  the  MAP-300  would  be  hidden  behind  the  FFT 
operation  (except  for  the  initial  case)  and  would  not 
contribute  to  overall  execution  time. 

This  would  effectively  eliminate  the  entire  signal- 
processing   subsystem  with  th  exception  of  the  ^iAP-300.   The 
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PDP-11/70  computer  in  the  data  orocessing  subsystem  coula 
control  the  MAP  along  with  its  other  intended  function  of 
controHinq  the  display  subsystem.  Any  storage  necessary 
for  output  or  any  taped  inout  data  could  be  handled  by  the 
tapes  and  disks  associated  with  the  11/70  and  execution 
could  be  performed  on  the  MAP-300  along  with  the  above 
calculations.  However,  for  expanded  utilization,  not 
specifically  aoaressed,  the  above  use  of  only  one  i^AP  and  no 
PDP-11/3^  may  have  to  be  modified  to  accomodate  the  new 
reauirements  if  these  new  reauirements  are  significantly 
1  a  rge  r  . 

If  after  extensive  testina  the  ^AP-300  proves  to  be  too 
costly  due  to  unreliable  software,  the  AP-lc?OB  can  perform 
the  same  functions  although  at  an  increased  hardware  and 
t  i  me  cos t . 

For  example,  in  the  AP-1^08,  to  perform  the  above  real 
to  complex  FFT,  it  requires  5.08  milliseconas  for  the  FFT, 
0.8  microseconds  to  rescale  ana  1.7  microseconds  to  reformat 
the  result  for  a  total  of  5.0^  milliseconds  per  102^  sample 
FFT.  To  this  must  be  added  100  to  1000  microseconds 
overheaa  for  each  of  the  four  call  statements:  Get  data 
from  the  AP- 1 20b C APGE T ) ,  Put  data  into  the  AP- 1 20B ( APPUT ) , 
real  to  complex  FFT(fiFFT)  and  real  FFT  scale  and 
f  o  rma  t  ( ^^FFTSC  )  .  I  will  use  the  arithmetic  average  of  bSO 
microseconds  oer  call  for  an  added  2.2  milliseconds 
resulting  in  a  subtotal  of  7,29  milliseconds  oer  FFT.   APPUT 
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and  APGET  have  no  specific  times  in  ref  [6]^  but  according 
to  Floating  Point  Systems  the  PDP-11  interface  transfer  rate 
is  750  KHZ,  This  would  therefore  reauire  approximately  2.67 
milliseconds  for  each  IQ^?^  element  transfer  giving  a  total 
of  9,96  milliseconds  each  for  39.1  FFT's,  This  results  in  a 
389,5  millisecond  execution  and  transfer  time.  Again, 
allowing  for  50  oercent  overhead  safety  margin,  the  total 
becomes  57^4.16  milliseconds  oer  second.  To  perform  the 
discrete  filtering  would  reciuire  an  additional  APGET,  APPUT, 
RFFT,  RFFTbC  as  well  as  a  vector  mu 1 t i p 1 y ( V ^UL)  and  a 
complex  vector  multiplyCCvN'UL)  oringing  the  time  to  compute 
one  seconds  worth  of  data  to  well  over  one  second. 

Therefore  another  AP-120B  must  be  installed  to  insure 
that  speed  reguirements  are  met.  Also/  since  the  host 
computer  must  be  interruoted  many  times,  it  mav  be  necessary 
to  retain  the  PDP-11/3-4  in  the  sianal  orocessinq  subsystem. 
There  is  also  the  cons i ae r a t i on  that  if  a  math  routine  is 
custom  written,  it  will  not  be  able  to  be  loaded  in  the  math 
library  which  will  generate  considerable  overhead  each  time 
it  is  called,  (The  amount  of  this  overhead  time  is  system 
depenaen  t  ,  ) 
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\/II.   CONCLUSIONS  AND  RECOMMEND  A  T  I  ONS 


The  test-bed  as  prooosed  seems  to  be  a  workable  aesign/ 
althouqh  for  nnost  applications  a  more  efficient  and 
economical  architecture  may  be  constructea. 

For  many  uses  the  neec  for  the  POP-11/3^  comouter  ana 
the  AN -5^0  0  A/D  converter  seem  unnecessary  when  used  in 
conjunction  with  the  ^lAP-SCO  array  processor.  The  Aticex 
Megastnre  may  be  reouirec  for  a  few  applications  but  would 
not  be  suitable  for  the  majority  of  apoli cations  (including 
real-tiTeJ  since  a  disk  oerioheral  attached  to  the  PDP-11/7U 
would  be  cheaper  ana  still  perform  the  same  functions. 

It  is  felt  that  the  increase  in  comolexity  and  possible 
confusion  using  the  '^iAF-300  over  the  AP-12UB  can  be 
overshaao«ed  by  the  '"eduction  in  eguipment  reauired  by  the 
MAP-3  0  0.  This  increased  oroficiency  should  even  te  more 
greatly  felt  (assuming  a  normal  learnina  curve)  with 
subsequent  installations.  A1so/  wit-h  the  time  savina  in 
execution/  extra  calculations  coulg  be  performea  on  the  MAP 
in  a  real-time  environment^  thereby  increasing  efficiency/ 
operaoility  and  soectrum. 

It  is  recommenoea  that  further  tests  be  conducted  using 

the   actual   applications/  oata  tyoes  ang  speed  reguirements 

to  fully  evaluate  the  most  economical  and  efficient   minimum 
design  necessary. 
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